Scaling MongoDB for a Growing Customer Base
Here at Blend, one of the primary databases we use is MongoDB. MongoDB gives a number of important features, including flexible schemas, horizontal scalability, and solid operational tooling which have let us grow quickly as a business.
An important aspect of this success is how we organize data in our database; we want to ensure isolation between our customers’ data as much as possible in a single MongoDB cluster. Our solution for this is to have separate databases for each customer. This provides query safety: any individual query should always return exactly one customer’s data.
In this post we will talk about how we horizontally scaled our MongoDB cluster to over 40,000 requests per second, while still maintaining our customer data separation and query safety.
Signals of Problems
Our initial MongoDB cluster setup was a single replica set deployment consisting of three nodes (one primary, two hot secondaries) deployed across three AWS availability zones. We started seeing very noticeable performance drops every 60 seconds. Due to the regular cadence we were able to identify that this was from an internal WiredTiger process known as checkpointing. In short what happens here is MongoDB touches every file to ensure data consistency.
This problem was amplified for us because of how WiredTiger stores data on disk and because of our requirement of data separation. WiredTiger creates a new file for each collection and index in every database. We had a multiplicative factor here since each customer has a dedicated database and each database contains collections and indexes that are essentially the same across customers. At a certain point we started seeing the checkpointing process hitting our Linux servers’ max file handles.
Finding a Solution
In the short term, we performed an index cleanup exercise to remove any indexes
that were unused or already covered by a compound index. We utilized two
great options here to help assess index usage: MongoDB’s
query and MongoDB Compass. We also enabled and migrated to using the
directoryPerDB option to help bring down the count of files in a single
Next we needed a long term plan that could scale with our business but still keep our data segmented by customer for query safety. Since we basically reached an OS limit for the number of file handles, a single replica set deployment was not going to be sustainable going forward. We considered many options including creating multiple MongoDB clusters, but we steered away from doing so because we didn’t want any possibility of errors when mapping a customer to the right cluster. We finally landed on a sharded approach that satisfied our requirements. Our approach is a bit different from the typical horizontal scaling strategy used with MongoDB so it’s worth discussing in more detail.
We elected to shard based on customer: all of the data for a given customer resides on a single shard. Each shard contains all of the data for a portion of our customers. To contrast, in a typical horizontal sharding approach documents can be on any shard, even within the same collection. This ensures that we won’t be hitting the file handle limit again—MongoDB does not need to have a file for every collection (and index) for every customer on every shard. With this horizontal scaling plan in place, we can add new shards based on customer count.
Using MongoDB’s native sharding ability, the MongoDB Config Servers and Routers track which shard contains which customer. With this separation of concerns, our developers and applications need only worry about their data—the shards are completely opaque. This minimizes the chance of error in application code and enabled our transition to a sharded MongoDB cluster with minimal code changes.
Splitting Data Across Shards
After deciding on a sharding strategy, we had to carry out a migration. Migrating existing data with zero downtime is certainly possible, but presents many opportunities for data integrity issues and usually requires significant engineering effort. We were fortunate enough to have a maintenance window overnight for a maximum of two hours with our customers, so we were able to avoid the significant investment of resources needed for a zero downtime migration.
After a few practice runs, we were able to streamline our process to maximize the two hour window and complete our migration. We carried out the migration process as follows:
- Before the maintenance window started, brought up MongoDB Config Server replica set and Routers and ensured they were all connected
- Once the maintenance window started, stopped all writes to the database
- Ensured everything in MongoDB was flushed to disk, then took a volume snapshot
- Brought up one member (the primary) for each of the shard replica sets from this volume snapshot
- Pruned databases from each running shard so that each customer was only on one shard, leaving the combined set of customers intact
- Added each shard to the MongoDB cluster through the Router
- Created replicas for each shard for high availability by taking a new volume snapshot of each primary shard and standing up two secondaries from each snapshot in different availability zones
- Updated our applications to point at the Routers instead of the original cluster domain
All of this was completed in about an hour, well within the maintenance window!
When faced with a need for horizontal scaling, keep in mind:
- There are many reasons to shard other than just scaling for reads and writes or capacity constraints
- Vertical scaling does not solve for every problem: no matter the machine specifications, we faced a hard limit on the number of file handles
- Investigate and make sure the solution presented will resolve the issue at hand