Nov 18, 2018

10 Things You Should Know about MongoDB Sharding


MongoDB Sharding is a great way to scale out your writes without changes to application code. However, it requires careful design to enjoy all the scale and performance benefits:
  1. You cannot rename a sharded collection. Think twice before any action!
  2. MongoDB collection cannot be resharded: To reshard a collection, you will have to copy the collection into a new sharded collection (or do it twice if you cannot rename the collection name in the application level). You will need a script like this:
    use app;
    db.c_to_shard.find().forEach(function(doc){
            db.c_to_shard_new.insert(doc);
            db.c_to_shard.remove({"_id":doc._id});
    });
  3. When you run this script, make sure you do it on the cluster (or on a server with minimal latency to the cluster) to accelerate copy.
  4. If collection is already large, you will need to split the script and run multiple processes to fasten the copy.
  5. Avoid creating a unique index on the sharded collection. Sharded collection cannot utilize unique index..
  6. Be careful to avoid selecting a field that does not exist in all the documents. Otherwise, you will find tons of "missing documents".
  7. Follow the previous rule when selecting range based sharding.
  8. Choose the sharding key wise. If you do not sure 100% your keys will be distributed evenly, use a hashed key to ensure it
    db.c_to_shard.ensureIndex({user_id: "hashed"})
    sh.shardCollection("app.c_to_shard", {"user_id": "hashed"})
  9. Verify your shards and chunks are evenly distributed using getShardDistribution(). Verify it once sharded is done, and as data starts to get into the collections. If something is wrong, the smaller the collection, the easier the fix.
    db.c_to_shard.getShardDistribution() or do the same with the following script that will provide you a detailed information on each shard and chunk.
  10. If all chunks reside on a single shard, verify the balancer status, and start if needed:
    sh.getBalancerState()
    sh.startBalancer()
Bottom Line
MongoDB sharding is a great way to scale out your data store write thoughput. However, you should select your sharding key wise, otherwise, it might require a significant effort to fix stuff.

Keep Performing,

ShareThis

Intense Debate Comments

Ratings and Recommendations