Nov 16, 2020

MongoDB Review Checklist and Tools

Tools

  1. Analyze the mongostat results: https://docs.mongodb.com/manual/reference/program/mongostat/
  2. And mongotop to see query highlight https://docs.mongodb.com/manual/reference/program/mongotop/
  3. Enable slow queries: db.setProfilingLevel(1,50)
  4. Dex: to analyze slow queries: http://dex.mongolab.com/
  5. mtools to visualize the query usage https://github.com/rueckstiess/mtools
  6. System metrics: top and iostat

Checklist

  1. Usage and location of the Replica Set instances
  2. Check system resources usage: CPU and disk IOPS
  3. Check network latency, bandwidth  and Application level Read Preference and Write Concern
  4. Check mongo instances usage (mongostat)
  5. Check top slow queries (mongotop, slow queries)
  6. Check queries and command usage using 
  7. Check backup strategy
  8. Check storage engine and MongoDB version
  9. Check monitoring tools

Jun 7, 2020

Enable Slow Queries Analysis on Postgres RDS (pg_stat_statements)

Analyzing slow queries on Postgres can be done by enabling slow queries, but it is even more fun with the builtin extension: pg_stat_statements

1. Enable tracking by changing parameters group values (if you did not create a custom parameter group follow these instrcutions and don't forget to reboot the instance to take effect):
pg_stat_statements.track ALL

2. Connect to Postgres
psql -U ADMIN_USER -h RDS_PATH.rds.amazonaws.com -d postgres

3. Create the extension
CREATE EXTENSION pg_stat_statements;

4. Verify the extension was created:
SELECT * 
FROM pg_available_extensions 
WHERE 
    name = 'pg_stat_statements' and 

    installed_version is not null;

5. Analyze the queries usage:
SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20;

Keep Performing,
Moshe Kaplan

Feb 17, 2020

Kafka Cluster Hardware Best Practices

Reg. Kafka case studies
Some of the largest Kafla installations in Israel are Adtech (Kenshoo, Taboola...) and Cyber (Verint...) with >40TB or 75M messages per day 

Although the literature claims that Kafka async writes favours magnetic disks (or at least minimizes their benefit [1]) The practice shows that using attached storage 1TB SSD is the best (Kenshoo for example), and it confirmed by confluent that IO can bound performance if disks are slow (our case) [3]

Micron Benchmark shows x4 to x20 performance when using SSD/NVMe [4]
You may avoid using NAS/SAN/external storage machine [2]

You should avoid placing OS and logs on the Kafka disks [2]

One last thing, verify you are using 10Gb NICs [5]

Keep Performing,

Using SSDs instead of spinning disks has not been shown to provide a significant performance improvement for Kafka, for two main reasons:

Kafka writes to disk are asynchronous. That is, other than at startup/shutdown, no Kafka operation waits for a disk sync to complete; disk syncs are always in the background. That’s why replicating to at least three replicas is critical—because a single replica will lose the data that has not been sync’d to disk, if it crashes.
Each Kafka Partition is stored as a sequential write ahead log. Thus, disk reads and writes in Kafka are sequential, with very few random seeks. Sequential reads and writes are heavily optimized by modern operating systems.
 



Nov 10, 2019

Poor TokuDB Performance due to Unreasonable Statisitcs

The details behind TokuDB Statistics 

All the details about TokuDB Background ANALYZE TABLE

Upgrade to 5.6.38: to enjoy TokuDB: ANALYZE TABLE Is No Longer a Blocking Operation - Percona Database Performance Blog



Keep Performing,
Moshe Kaplan

Jan 9, 2019

Disaster Recovery Plan (DRP) for MySQL/MariaDB Galera Cluster

When S#!t Hits the Fan...

That is a good reason to prepare for failure to minimize data loss and downtime

Cluster Design Documentation

First document your cluster, and verify you have:
  1. Odd number of instances (>=3) on at least 3 independent location
  2. At least a daily backup using XtraBackup is saved remotely
  3. Enabled Monitoring is enabled (warning you from low disk space or under performing instances). 
  4. Enabled slow queries monitoring to make sure query performance is monitored and you take care of slow queries to maximize UX

Data Recovery Plan (DRP)

DR Cases

  1. Data was deleted/modified accidentally. This case will require either:
    1. Accept the data loss
    2. Get back to daily backup and lose any data collected since last backup (T1+T2).
    3. Recover the database on a new node, and cherry picking the changes on the current cluster (T1)
  2. Single node was crushed
    Galera support an automatic recovery of a node w/o significant work. Recovery can be accelerated by recovering the node from the daily backup (T1)
  3. All Cluster is not working
    1. Requires recovery of a single node from daily backup (T1)
    2. Setup the cluster (T2)

Technical Procedures

T1: Restore a node from daily backup
  1. Bring back the files from the the remote backup to /mnt/backup/
  2. Uncompress the files
    sudo tar xvfz $file_name
  3. Shutdown the MySQL
    sudo service stop mysqld
  4. Copy the files to your data folder (/var/lib/mysql)
    sudo rm -rf /var/lib/mysql/*
    sudo innobackupex --copy-back /mnt/datadrive/mysqlbackup/
  5. Verify the folder permissions
    sudo chown -R mysql:mysql /var/lib/mysql
  6. Restart the MySQL and verify everything is working.
    sudo service mysql start
  1. Verify Galera was defined in my.cnf and define
    [mysqld]
    wsrep_cluster_address=gcomm://10.10.10.10
    wsrep_provider=/usr/lib64/libgalera_smm.so
  2. Start the first node:
    sudo service mysql start --wsrep-new-cluster
  3. Verify the cluster size
    mysql> SHOW STATUS LIKE 'wsrep_cluster_size';
    +--------------------+-------+
    | Variable_name      | Value |
    +--------------------+-------+
    | wsrep_cluster_size | 1     |
    +--------------------+-------+
  4. Repeat the process on the other nodes, this time just w/ a simple MySQL restart
    sudo service mysql start
T3: Add/Remove a node
  1. Restore the node from the nightly backup (T1)
  2. Perform step 4 in setup a cluster (T2)

Bottom Line

Being ready for the worst, can help you mitigate it with minimal data loss and minimal downtime

Keep Performing,
Moshe Kaplan

Nov 18, 2018

10 Things You Should Know about MongoDB Sharding


MongoDB Sharding is a great way to scale out your writes without changes to application code. However, it requires careful design to enjoy all the scale and performance benefits:
  1. You cannot rename a sharded collection. Think twice before any action!
  2. MongoDB collection cannot be resharded: To reshard a collection, you will have to copy the collection into a new sharded collection (or do it twice if you cannot rename the collection name in the application level). You will need a script like this:
    use app;
    db.c_to_shard.find().forEach(function(doc){
            db.c_to_shard_new.insert(doc);
            db.c_to_shard.remove({"_id":doc._id});
    });
  3. When you run this script, make sure you do it on the cluster (or on a server with minimal latency to the cluster) to accelerate copy.
  4. If collection is already large, you will need to split the script and run multiple processes to fasten the copy.
  5. Avoid creating a unique index on the sharded collection. Sharded collection cannot utilize unique index..
  6. Be careful to avoid selecting a field that does not exist in all the documents. Otherwise, you will find tons of "missing documents".
  7. Follow the previous rule when selecting range based sharding.
  8. Choose the sharding key wise. If you do not sure 100% your keys will be distributed evenly, use a hashed key to ensure it
    db.c_to_shard.ensureIndex({user_id: "hashed"})
    sh.shardCollection("app.c_to_shard", {"user_id": "hashed"})
  9. Verify your shards and chunks are evenly distributed using getShardDistribution(). Verify it once sharded is done, and as data starts to get into the collections. If something is wrong, the smaller the collection, the easier the fix.
    db.c_to_shard.getShardDistribution() or do the same with the following script that will provide you a detailed information on each shard and chunk.
  10. If all chunks reside on a single shard, verify the balancer status, and start if needed:
    sh.getBalancerState()
    sh.startBalancer()
Bottom Line
MongoDB sharding is a great way to scale out your data store write thoughput. However, you should select your sharding key wise, otherwise, it might require a significant effort to fix stuff.

Keep Performing,

Jun 10, 2018

Getting Enterprise Features to your MongoDB Community Edition

Many of us need MongoDB Enteprise Edition, but might be short of resources, or would like to compare the value.

I have summarized several key features of MongoDB Enteprise Edition and their alternatives

Monitoring Options:
  • MongoDB Cloud Manager: Performance monitoring ($500/yr/machine) => $1000-1500
  • Datadog/NewRelic => $120-$180/yr per machine, Datadog is better for this case
  • DYI using tools such mongotop, mongostat, mtools and integrate w/ grapha and other

Replication is super recommended and is part of the community edition:
Replica set => min 3 nodes, at least 2 data nodes in 3 data centers (2 major DC and one small).
Backup and Restore:
There are 3 major options (that can be combined of course):
  • fsync to mongodb and physical backup:
    • fast backup/restore
    • Might be inconsistent/unreliable
  • Logical backup: based on mongodump
    • Can be done w/ $2.5/GB using the cloud manager w/ Point in time recovery
    • Can be done w/ Percona hot backup
    • Incremental is supported 
  • Have a delayed node
The first two may be done using a 3rd data node in hidden for backup (high frequency backup) that enable

Encryption Alternatives:
  • Disk based encryption => data at rest (can be done in AWS and several storage providers)
  • eCryptFS => Percona => data at Rest
  • Encryption based application by the programmers in the class level before saving to disk.

Audit:
Use Percona edition is a good alternative that may close many of your enterprise needs

BI :
Well supported with MongoDB BI Connector in the enterprise edition, but can be done also w/
  • Some BI tool supports MongoDB natively
  • 3rd party provider for JDBC connector: such as simba and https://www.progress.com/jdbc/mongodb
Bottom Line
Getting your MongoDB Community Edition to meet Enterprise Requirements is not simple but with the right effort it can be done.

Keep Performing,
Moshe Kaplan

ShareThis

Intense Debate Comments

Ratings and Recommendations