Running MongoDB in Small Environments

Posted on December 11, 2023 • 6 minutes • 1246 words • Other languages: Deutsch

I love MongoDB and I have been using it for a score of projects in the last 12 years. In fact, MemArc is based on MongoDB since the beginning and I never regretted the decision to implement the platform using this database.

In this article, I want to explain some options on how to set up Mongo in small environments such as virtual machines or Raspberry Pis. I have found no real good tutorials for this use case, so I wrote my own.

Why?

Why should I want to set up Mongo in a small environment? Here are some ideas:

Using Mongo for a small application which does not require much data. I mean, using Mongo is fun, so why not use it for a small project with a few hundred or thousand datasets, too?
Setting up a small development environment that does not eat up most of your RAM.
Aggregating time data on your Raspi server in your basement – unfortunately, your model has only 2 GB or RAM and you still want to use it for some cool project.

Preparations and Considerations

I assume that you have set up your Mongo database in some way, either as a single node or as a replica set. There are plenty of tutorials out there, so I will not duplicate this here. For reference, you can follow the installation instructions found on the Mongo website . It does not matter whether you have set up your server directly or via container (Docker or Podman).

In your base configuration, Mongo will do the following:

RAM usage: By default, Mongo will use up to 50% of (RAM - 1 GB) and the minimum of 256 MB for Caching. If you have a machine with 8 GB of RAM, Mongo will use up to 3 GB for its WiredTiger cache (indexes). This is the easiest and most important parameter to tweak (see below).
Disk size: All modern Mongo implementations use WiredTiger storage engine which means that data on disk will be compressed by default (using snappy compression unless you change that). Small collections will use only a few kilobytes (like 20 kb), so you generally do not have to worry about disk usage too much. Indexes will increase disk usage and like in other database storage engines, many indexes on large datasets can increase disk usage considerably. Still, if you have a relatively small amount of data, using WiredTiger will not let you worry about your disk space (ignore old tutorials, recommending setting smallFiles parameter. It has been deprecated since version 4.2.).
CPU: Mongo will not use much CPU while idling. Only queries will need CPU power. Much like other databases, CPU is not as important as disk throughput and RAM, unless you plan to create some really crazy queries. But you do not want to do this in a small environment anyway, right?

Replica Set or Not?

Should you use a replica set in small environments? Well, that depends on your use case. I run some small instances as standalone servers and I can sleep safe and sound. I create a dump once per day and back it up to some remove machine. But these instances are not mission-critical or contain much mutable data. And in small datasets, backups are really not cumbersome, so running one machine is perfectly fine in such a scenario. In the past 12 years, I never lost data on a Mongo machine due to software bugs or some corruption of the database itself, so I have faith in Mongo’s storage engine as such. If you use replication, be sure to put it on different machines and different storage locations. Otherwise, replica sets do not make much sense other than making it easier to perform live upgrades on the underlying infrastructure.

To sum up, you do not need a replica set if:

Your data does not change a lot.
You can back up your data quickly and easily (small dataset).
Data can have a downtime of a few minutes (during updates).

If one of these conditions is not met, consider using a replica set and set it up properly (multiple servers, different storage locations, etc.)!

Tweaking RAM Usage

If you have a limited amount of RAM, there really is only one important option to tweak: wiredTigerCacheSizeGB. For most scenarios, this is sufficient. The easiest way to use this option is via command line: Just start your Mongo server with something like --wiredTigerCacheSizeGB 0.5. This will limit the maximum amount of cache to 0.5 GB. Remember, the minimum is 0.25 (which is sufficient for very small datasets).

You can also set this in your configuration file (generally found in /etc/mongod.conf):

storage:
  wiredTiger:
    engineConfig:
      cacheSizeGB: 0.5

For more information, refer to the official documentation .

Now, if you set this parameter really low, be sure to understand that your cache size will be pretty limited, and you will run into performance issues once your data grows large. If you have several thousand records only, you should be safe.

On the Mongo shell, you can ask your instance about memory, too:

db.serverStatus().tcmalloc.tcmalloc.formattedString

will return a nicely formatted string containing information about memory usage, for example. You can look up more information about server status information in the documentation .

More Settings

There are some more settings you can tweak. Most of the time, I do not bother changing them for small datasets. But you might have different requirements, so I mention them for the sake of completeness.

The first thing to consider is virtual memory and how it is handled on your machines. The “dirty ratio” is the percentage of total system memory that can hold dirty pages, before it starts writing them to disk. Many tutorials recommend the following for MongoDB (set in /etc/sysctl.conf or by sysctl):

vm.dirty_background_ratio = 5
vm.dirty_ratio = 15

Another setting is swappiness, a setting that influences the manager for virtual memory on how eager it will swap memory onto physical disk. The following setting is recommended for database servers in general (1 means to use swap only to avoid out-of-memory problems):

vm.swappiness = 1

Mongo might also complain about NUMA (Non-Uniform Access) architecture or the use of “Transparent HugePages”. There is information about this in the documentation. On NUMA you will probably have to run your mongo process using numactl --interleave=all. Transparent HugePages can be disabled setting transparent_hugepage=never in your grub configuration.

Mongo will also complain if you do not use XFS on your Linux server. WiredTiger and EXT4 do have some performance issues you can run into. In small projects, I would not worry about this, so keep your virtual server as it is, unless you have the means to set it up with a nice XFS partition in the first place. The same is true for Mongo not running on a proper SSD - a small instance will run fine on a Raspi’s SD card or on a USB stick (gosh).

A pretty good list of all the considerations can be found in the documentation on production notes .

Conclusion

It is perfectly fine to run MongoDB on a small server or virtual machine with very limited memory. As long as your data is small, you will not have any trouble. The most important setting in such an environment is the wiredTigerCacheSizeGB command line option. It will be your friend to tweak Mongo in not eating too much of your memory. Do not worry too much about the other settings.

By logging in into comments, two cookies will be set! More information in the imprint.