Sep 22, 2021

An Open Source Data Masking Solution for MySQL

Today we'll disucss the need for data masking due to privacy regulations such as GDPR that becone more and more common in the industry.

In order to deploy such a solution we'll utlize two great products:

1. Percona Server 8.0.17 that has recently introduced the data masking plugin (that is compatible w/ MySQL Enterprise one. This plugin exposes multiple functions that translate sensitive strings such as SSN and emails to masked strings.

2. ProxySQL a proxy server that supports modifying SQL queries on the fly. For example replacing SELECT ssn FROM users; with SELECT mask_ssn(ssn) FROM users;

The Percona Server will serve as our MySQL solution (you can use it as a slave instance if you need it for analyst purposes only). while the ProxySQL will serve as a Proxy that modifies SQL queries to utilize the Percona server data masking functions. You may also need to limit users access from the network to the Percona server.

Bottom Line

New times bring new products that can serve us to create novel solutions

Keep Performing,

Moshe Kaplan

Jun 9, 2021

MongoDB Monitoring Using Prometheus

If you environment is bound to Prometheus it is best we focus on this platform as a baseline.

We will need multiple aspects:

System Counters
Main metrics needed:
1. Machines CPU
2. Read and Write IOPS to disks
3. Memory utilization
4. Disk utilization
5. Incoming and outgoing network traffic

MongoDB Counters
For this task it is best to use the MongoDB exporter
There are also 3 Grafana dashboards provided that we can use to monitor the performance
MongoDB Overview;
MongoDB ReplSet;
MongoDB WiredTiger;

These will be able to provide us key indicators such as locks, replicaset status, 

MongoDB Slow Queries
It will also be wise to enable the MongoDB slow query w/ an initial limit of 100ms to gain some overview of the cluster slow queries:

Feb 10, 2021

Why 4 Nodes MongoDB Replicaset is not a Good Idea

Customer Question:

We have a 4 nodes replicaset w/ leading nodes each with different priority. From time to time, when we suffer from network issues, all the replicaset goes down.

Why is this Happens?

Since two of the nodes are a single site and the two other on two other sites, when the first site goes offline, no site can create a majority. Therefore, you must remove one of the nodes in the first site to avoid these downtimes.

Moreover, the MongoDB selects the primary node by priority. If the primary is unstable, everytime it will go offline, a secondary will be chosen w/ a possible few secondds replicaset downtime. When the high priority node goes back online, it will be selected again to primary (and causing another possible downtime).

Therefore, if there is no good reason, avoid specifing variable priorities.

How to fix It?

1. Perform the task during off peak hours

2. Remove the arbiter node by rs.remove() 

3. Verify replicaset is okay by running rs.status()

4. Modify the remaining nodes configuration:

cfg = rs.conf()

cfg.members[0].priority = 1

cfg.members[1].priority = 1

cfg.members[2].priority = 1


5. Verify again.

Bottom Line

Keep it Simple :-)

Keep Performing,

Moshe Kaplan

Nov 16, 2020

MongoDB Review Checklist and Tools


  1. Analyze the mongostat results:
  2. And mongotop to see query highlight
  3. Enable slow queries: db.setProfilingLevel(1,50)
  4. Dex: to analyze slow queries:
  5. mtools to visualize the query usage
  6. System metrics: top and iostat


  1. Usage and location of the Replica Set instances
  2. Check system resources usage: CPU and disk IOPS
  3. Check network latency, bandwidth  and Application level Read Preference and Write Concern
  4. Check mongo instances usage (mongostat)
  5. Check top slow queries (mongotop, slow queries)
  6. Check queries and command usage using 
  7. Check backup strategy
  8. Check storage engine and MongoDB version
  9. Check monitoring tools

Jun 7, 2020

Enable Slow Queries Analysis on Postgres RDS (pg_stat_statements)

Analyzing slow queries on Postgres can be done by enabling slow queries, but it is even more fun with the builtin extension: pg_stat_statements

1. Enable tracking by changing parameters group values (if you did not create a custom parameter group follow these instrcutions and don't forget to reboot the instance to take effect):
pg_stat_statements.track ALL

2. Connect to Postgres
psql -U ADMIN_USER -h -d postgres

3. Create the extension
CREATE EXTENSION pg_stat_statements;

4. Verify the extension was created:
FROM pg_available_extensions 
    name = 'pg_stat_statements' and 

    installed_version is not null;

5. Analyze the queries usage:
SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20;

Keep Performing,
Moshe Kaplan

Feb 17, 2020

Kafka Cluster Hardware Best Practices

Reg. Kafka case studies
Some of the largest Kafla installations in Israel are Adtech (Kenshoo, Taboola...) and Cyber (Verint...) with >40TB or 75M messages per day 

Although the literature claims that Kafka async writes favours magnetic disks (or at least minimizes their benefit [1]) The practice shows that using attached storage 1TB SSD is the best (Kenshoo for example), and it confirmed by confluent that IO can bound performance if disks are slow (our case) [3]

Micron Benchmark shows x4 to x20 performance when using SSD/NVMe [4]
You may avoid using NAS/SAN/external storage machine [2]

You should avoid placing OS and logs on the Kafka disks [2]

One last thing, verify you are using 10Gb NICs [5]

Keep Performing,

Using SSDs instead of spinning disks has not been shown to provide a significant performance improvement for Kafka, for two main reasons:

Kafka writes to disk are asynchronous. That is, other than at startup/shutdown, no Kafka operation waits for a disk sync to complete; disk syncs are always in the background. That’s why replicating to at least three replicas is critical—because a single replica will lose the data that has not been sync’d to disk, if it crashes.
Each Kafka Partition is stored as a sequential write ahead log. Thus, disk reads and writes in Kafka are sequential, with very few random seeks. Sequential reads and writes are heavily optimized by modern operating systems.

Nov 10, 2019

Poor TokuDB Performance due to Unreasonable Statisitcs

The details behind TokuDB Statistics 

All the details about TokuDB Background ANALYZE TABLE

Upgrade to 5.6.38: to enjoy TokuDB: ANALYZE TABLE Is No Longer a Blocking Operation - Percona Database Performance Blog

Keep Performing,
Moshe Kaplan


Intense Debate Comments

Ratings and Recommendations