Nov 29, 2014

MySQL Replication Over Slow Links/High Latency

MySQL replication is considered to be efficient and usually changes in the master server are performed in the slaves within a single second. 
However, if you suffer from a replication that fails to close the gap, there are two main reasons for it: 

  1. Slave Disk Issue: as replication is single threaded per database, usually the slave lags behind due to disk latency when implementing the changes. In this case you should consider using SSD to accelerate the process.
  2. Low Bandwidth/High Latency Networking: in case where the two servers are located on remote locations (high latency case) or there is a low bandwidth between the servers, we should focus on minimizing the traffic between the servers using one (or both) of the following methods:
    1. Using statement based replication: Row based replication creates a SQL statement per each changed row in the database. Statement based replication is records the actual SQL statement sent by the application. Usually statement based replication is much more efficient from log size aspect. However, you should be aware that it may not work correctly if you use UPDATE ... LIMIT 1 for example.
    2. Compressing the traffic: MySQL supports log replication compression using the slave_compressed_protocol parameter. This method will reduce the traffic between the servers by up to 80%. However, compression is compute intensive, so you should be aware of some extra CPU utilization (that is usually not an issue in databases). This parameter should be enabled on both servers:
      1. Dynamically from the MySQL command line:SET GLOBAL slave_compressed_protocol = 1;
      2. In the MySQL configuration file:#compress master-slave communication
        slave_compressed_protocol = 1
Bottom Line
Understand why your replication lags behind and use the right method to solve it. Yes, it is that easy.

Keep Performing,

Nov 22, 2014

Analyzing Twitter Streams in Real Time using Spark

One of the most interesting (and some people say not working) features in Apache Spark is the ability to analyze the Twitter stream in real time.

DZone just released earlier this week the new Spark RefCard. Since I made a peer review of it, it is a good time to discuss this topic.

Tuning an Out of a Box Solution
Spark provides an out of the box example that with some tuning, you will get the top ten trending Twitter tags every 10 and 60 seconds.

  1. Create a new Twitter App or use your existing app credentials at Twitter Apps.
  2. Download and install Java, Scala and Spark
  3. Adjust the environment variables :
    1. Scala home
      export SCALA_HOME=/usr/lib/scala
    2. PATH to run scala
      export PATH=$PATH:$SCALA_HOME/bin
    3. Add the location of spark-streaming-twitter_2.10-1.0.0.jar,  twitter4j-core-3.0.3.jar and twitter4j-stream-3.0.3.jar to CLASSPATH
      export CLASSPATH=$CLASSPATH:/root/spark/lib/twitter4j/
  4. Run the code after tuning some parameters:
    1. Get into the spark foldercd /var/lib/spark/spark-1.0.1/
    2. If you are running on a single core machine (or you want to just make sure you will get results and not just "WARN BlockManager: Block input-0-XXXXXXXX already exists on this machine; not re-adding it") change the ./bin/run-example code:
      sudo sed -i 's/local\[\*\]/local\[2\]/g' *.txt
    3. Run the example (please remember that you should write the class name, including the streaming., and avoid placing the path, the scala extension or any other fancy stuff:sudo ./bin/run-example streaming.TwitterPopularTags 
  5. The result will be shown after several seconds:Popular topics in last 60 seconds (194 total):
    #MTVStars (42 tweets)
    #NashsNewVideo (9 tweets)
    #IShipKarma (6 tweets)
    #SledgehammerSaturday (6 tweets)
    #NoKiam (5 tweets)
    #mufc's (3 tweets)
    #gameinsight (3 tweets)

Bottom Line
Spark is an amazing platform, with some little adjustments you will be able to enjoy it in a few minutes 

Keep Performing,
Moshe Kaplan

Aug 25, 2014

MySQL Installation on Ubuntu

It seems that MySQL was created to be installed on RHEL/CentOS/Fedora...
But what if you are actually using Ubuntu?
Well, this is the purpose of the following tutorial

  1. Trace all old MySQL installations on the servers (yes, some vendors are delivering baseline images with some old MySQL versions)
    sudo dpkg --get-selections | grep -v deinstall | grep -i mysql
    sudo apt-get --purge remove libdbd-mysql-perl libmysqlclient18 mysql-client mysql-client-5.5 mysql-client-core-5.5 mysql-common mysql-server mysql-server-5.5 mysql-server-core-5.5 php5-mysql
  2. Get the MySQL repository
    sudo wget -O mysql.deb
  3. Install the repository
    sudo dpkg -i mysql.deb
    sudo apt-get update
  4. Install the mysql package (you may need to select the right version and select a root password)
    sudo apt-get -y install mysql-server mysql-client
  5. If you have a pre-configured my.cnf, get it and replace the current my.cnf file. Then you should restart the MySQL. I would recommend you to keep your my.cnf in a code repository like git to manage the versionssudo apt-get -y install git
    sudo git clone wherever tmp && git mv tmp/.git . && rm -rf tmp
    sudo service mysql restart
  6. Now it time to recover your database from your backup (if you have one) and back to business...
Keep Performing,

Aug 19, 2014

Bringing Redis to an Enterprise Level Security

Little Security. Big Problems.
Redis is designed (as most early NoSQL products) as a product that should be used in secured environment (meaning little security measures are built in).

The only security measure Redis supports is authentication that is passed as clear text (definitely not a best security best practice).
UPDATE: Itamar Haber from RedisLabs referred me to an SSL communication encryption using stunnel or spiped

Big Problems. Great Solutions
If you love Redis (and w/ 60K Set and Get Ops/Sec on commodity hardware there is no reason you won't love it), and if you must have an enterprise grade solution, you can take one of the following approaches: 
  1. Implement a web layer in front of it (that can support SSL, encrypted authentication, logging and all other fancy stuff). A great example for this is Webdis with a built in solution. Off course, there is a performance penalty stick to the extra layer (and some issues as Redis still can be accessed directly).
  2. Save the data encrypted in Redis (encrypt data before the SET operation by the app servers, and decrypt it after the GET operation by consumers). This way, communication is not needed to be encrypted and hackers or malicious users can do little harm, even they are access the Redis directly as the Redis data store is encrypted.

Bottom Line
Even sensitive products that lack of basic security measurements, can be brought to enterprise level with the right design in mind.

Keep Performing,
Moshe Kaplan

Jul 26, 2014

Validating Your Disk Performance

Databases and data stores in general (yes, including the leading the leading NoSQL products such as MongoDB and Cassandra) love IO. The more RAM and SSD they have, the better performance you will get...

If you are short of RAM and get a poor performance, it may be a good time to verify your disks performance and see if they do match your needs.

How much IO do you need?
To analyze your needs you should use iostat. This utility will show for each mounted disk the actual number of blocks that are read from disk and are written to disk.
If the numbers are constantly higher than 1,000 blocks/sec, you should consider improving your application design or move from magnetic disks to SSD. In the case below for example the system has a write issue.

> iostat 2
Linux 2.6.32-358.11.1.el6.x86_64 (         07/26/2014      _x86_64_        (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           2.02    0.00    0.92    0.01    0.00   97.04

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda              92.88       599.18      3920.00  547971262 3584968522
sdb               0.00         0.00         0.00       3104          0

What are Your Disks Limits?
If you are not sure how much can you get from your disks, Bonnie++ is a great tool to explore your current limits. 

Installing Bonnie++
> wget
> tar xvfz bonnie++-1.03e.tgz
> cd bonnie++-1.03e
> make
> make install

Running Bonnie++
Select your target disk, the size of written data in MB (the -s flag), the number of repeats (the -n flag) and the server RAM size in MB (the -r flag). Please notice that you should define a write to disk value that is larger than the actual RAM value in order to avoid false results:
> sudo ./bonnie++ -u root -d /tmp -s 1500 -n 3 -r 750

Detecting an issues
As you will be able to see in the following lines Bonnie++ is very useful in detecting disk performance issue and failed disks (see Server C that has a very poor write performance)
Server A (SAS Disks) Results
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
serverA        100M 52626  99 +++++ +++ +++++ +++ 61374  99 +++++ +++ +++++ +++
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

Server B (SSD Disks) Results
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
serverB        100M 61864  99 +++++ +++ +++++ +++ 58338  99 +++++ +++ +++++ +++
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  3 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++

Server C (Failed SSD Disks) Results
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
serverC        100M  3892  99 104779  99 98847  99  3865  99 +++++ +++ +++++ +++
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  3 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++,100M,3892,99,104779,99,98847,99,3865,99,+++++,+++,+++++,+++,3,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

Bottom Line
When you plan your next move, you should verify your servers can handle it. These tools will help you verify at least your disk can take care of the your future tasks.

Keep Performing,
Moshe Kaplan

Jul 24, 2014

9 steps to Implement Offline Backup at Azure IaaS: Backup a MySQL to Azure Storage

A good backup is probably something you will thank for when s%$t hits the fan.
If you chose MySQL as your data infrastructure and Microsoft Azure as your cloud infrastructure, you will probably thank this procedure (that I actually turned into a a script).

Chosen Products
Our first task is choosing the backup and automation products. I selected the following two:
  1. Percona XtraBackup: a leading backup product by Percona. The product creates an hot backup of the database that is equivalent to disk copy. This method is much faster to backup and recover than mysqldump. It also support increment backup.
  2. Azure SDK Tools Xplat: a node.js based SDK that enables command line interface to the various Azure services including Azure Storage.
Backup Implementation Guide
  1. Install Percona XtraBackup
    sudo wget
    sudo dpkg -i percona-xtrabackup_2.2.3-4982-1.precise_amd64.deb
    sudo apt-get update
    sudo apt-get install -f
    sudo dpkg -i percona-xtrabackup_2.2.3-4982-1.precise_amd64.deb

    The CentOS
    sudo rpm -Uhv
    sudo yum -y install percona-xtrabackup.x86_64
  2. Install Azure SDK Tools Xplat
    sudo apt-get updatesudo apt-get -y install nodejs python-software-propertiessudo add-apt-repository ppa:chris-lea/node.jssudo wget --no-check-certificate | sudo shsudo apt-get install npmsudo npm config set registry npm install -g azure-clisudo ln -s /usr/bin/nodejs /usr/bin/node
  3. Install a backup procedure
    1. Get a publish settings file from Azure (can be done from the console).
    2. Get account name and the matching base64 key from the Azure console.
    3. Import the publish setting filesudo azure account import /opt/mysqlbackup/mysqlbackup.publishsettings
    4. Create a Storage Container
      sudo azure storage container create --container container_name -a account_name -k base64_account_key
  4. Run the backup
    1. The xtrabackup way
      1. Run a full backup and prepare it twice to make it ready for recovery
        sudo xtrabackup --backup
        sudo xtrabackup --prepare
        sudo xtrabackup --prepare
      2. Add the frm files and mysql database to your backup
        sudo chmod -R +r /var/lib/mysql/
        sudo cp -R /var/lib/mysql/myql/* /mnt/backup/mysql/
        sudo cp -R /var/lib/mysql/yourdb/*.frm /mnt/backup/yourdb/
    2. The innobackupex way:
      1. Run full backup and prepare (adjust the memory usage for your case):
        sudo rm -rf /mnt/datadrive/mysqlbackup/*
        sudo innobackupex --user=DBUSER --password=DBUSERPASS 
        sudo innobackupex --apply-log /mnt/datadrive/mysqlbackup/ --use-memory=500M
  5. Tar the files into a unique daily name_now=$(date +"%Y_%m_%d")
    tar cvzf "$_file" /mnt/datadrive/mysqlbackup/
  6. Copy the folder to Azure Storage using
  7. azure storage blob upload -f "$_file" -q --container container_name -a account_name -k base64_account_key
  8. Create a cron that will run it daily:
    > sudo cron -e
    * 0 * * * /opt/mysqlbackup/ >/dev/null 2>&1
Recovery Guide
  1. Bring back the files from the Azure storage to /mnt/backup/
    sudo cd /mnt/backup
    sudo azure storage blob download --container container_name -a account_name -k base64_account_key -b $file_name
  2. Uncompress the files
    sudo tar xvfz $file_name
  3. Copy the files to your data folder (/var/lib/mysql) after shutting down the MySQL
    1. The xtrabackup waysudo service mysql stopsudo rsync -avrP /mnt/backup/ /var/lib/mysql/
    2. The innobackupex way:
      sudo service mysql stopsudo rm -rf /var/lib/mysql/*sudo innobackupex --copy-back /mnt/datadrive/mysqlbackup/sudo chown -R mysql:mysql /var/lib/mysql
  4. Verify the folder permissions
    sudo chown -R mysql:mysql /var/lib/mysql
  5. Restart the MySQL and verify everything is working.
    sudo service mysql start
Bottom Line 
It may be tough. It may be resource and time consuming. However, you must have good recovery process to keep your a$$...

Keep Performing,

Jul 17, 2014

How to Disable MySQL Binlog

@ MySQL, clean the the old bin log files 
Take the File name

@ The /etc/my.cnf
Comment out log-bin
> #log-bin = ....

Restart the mysql
> sudo service mysql restart

Keep Performing,

Jul 14, 2014

Keep Your Website Up and Running During an Upgrade

All of us want to have our website up and running.
However, upgrading a software version requires downtime (even if it minimal).
What should we do?

The answer is simple: follow the following process and gain a highly available system based on load balancing and session off loading, as well as a no downtime procedure to upgrade your system.

Keep Your Site Up and Running @ AWS

  1. Place more than a single web instance behind a load balancer. The Elastic Load Balancer (ELB) will be fine for that.
  2. Make sure you are using session offloading in your system implementation.
  3. Install elbcli on your machine sudo apt-get -y install elbcli
  4. Create a credentials file /var/keys/aws_credential_file and secure it sudo chmod 600 /var/keys/aws_credential_file
  5. Upgrade the servers one after the another.
  6. Before upgrading a server, take it out of the ELB
    sudo elb-deregister-instances-from-lb MyLoadBalancer --instances i-4e05f721  --aws-credential-file=/var/keys/aws_credential_file
  7. And after completing the server upgrade take it back in.
    sudo elb-register-instances-with-lb MyLoadBalancer --instances i-4e05f721  --aws-credential-file=/var/keys/aws_credential_file
Bottom Line
A simple process and careful design will keep you system up and running.

Keep Performing,

Jul 10, 2014

Scale Out Patterns for OpenStack (and other Cloud) based Systems

It is a common question these days how one should design their next system to support elastic and growth requirements in the cloud era.

As I got this specific query today, I would like to share with you my answer based on various materials I created in the last few years:

1. Avoid File Storage SPOF: Use AWS S3 or its equivalent open source OpenStack Swift as a central file server repository when needed

How to use OpenStack Swift for your business case

2. Avoid Data Store SPOF: Use clustered data store that recovers automatically w/o a DBA or an operator intervention. Fine examples are AWS MySQL RDS, MongoDB and Cassandra

MongoDB HA concepts

3. Avoid Static Servers: Leverage Autoscale features or customize it for your needs

How to implement your application logic to auto scale your app

4. Avoid Messing with Cache: Use a central sharded cache to avoid cache revocation and reduce data stores load using MongoDB, CouchBase or Redis.

5. Offload your servers sessions: in order to avoid users log off and lost transactions:

How to implement a session offloading using a central store

6. Avoid Service Downtime due to Servers Downtime: More issues that can be found in my extensive presentation:

 - Use DNS Load Balancing for Geo LB and DRP (Slide 10)
 - Use CDN to offload network traffic (Slides 16-19)
 - Perform Session Offloading by cookies or a central store (Slides 62-65)

Bottom Line
You can scale your app! just follow the right recommendations

Keep Performing,

Moshe Kaplan

Jun 15, 2014

Auto Scaling your Workers based on Queue Length

Having an image or video processing engine?
Having a web crawling solution?
Doing OCR on media?
If so, this post is probably for you.

The Producer-Consumer Auto Scaling Problem
Back at school, you probably learnt how to manage the queue of waiting tasks in a producer-consumer problem. You may learnt how to avoid expired tasks and double processing of same tasks.
However, back then we all assumed the number of workers and producers is fixed.

Well, it is not true any more...
If you system includes a backend processor that its load may vary based on clients demand, you may need to auto scale it. Back then, you needed to purchase new servers. These days it just about launching new instances in your favorite cloud.

Adjusting AWS Auto Scaling to Support this Pattern
AWS Auto Scaling solution is known as the best of its kind in the industry. It supports:
  1. Dynamic scaling based on instances load (e.g CPU).
  2. Automatically launching and terminating instances.
  3. Supporting both on demand (expensive) instances and Spot (marginal cost) instances.
However, it does not support complex decisions such as queue length/instances length ratio out of the box.

Fortunately, AWS provides a complete set of API that let us define a solution that its general flow is described here and here:
  1. Create a base AMI to a worker/consumer machine
    1. If your code tends to change you may define a method that will update to the latest code when machine is being launched
    2. A better solution may be using a DevOps platform such as Puppet or Chef.
  2. Define auto scaling initial configuration with an initial size of minimum and maximum number of instances (N = 1).
  3. Define a spot instance price: make it reasonable (not too high to avoid extra cost, and not too low to actually launch instances when needed).
  4. Create a cron that will run every 1 min and will check the queue length. This cron will terminate or launch instances by adjusting the auto scaling configuration minimum and maximum number of instances (N).
The Algorithm
You need to take several decisions in order to decide how many instances you want on air:
  1. What is the ratio of tasks in queue (Q) to running instances (for example R = 2)
  2. How quick do you want to reach this ratio. The quicker you do, the higher it will cost (instances that are billed hourly, may be terminated and relaunched again only few minutes after that).
Two examples for a possible algorithms:
  1. Keeping the ratio of queue length and running instances constant: for example: N = max[1, Q/R]
  2. Soften the ratio by using previous calculated number N': N = (N' + max[1, Q/R])/2
Bottom Line
Auto Scaling is an amazing tool, and by adjusting it we can easily solve complex issues.

Keep Performing,

May 21, 2014

Introduction to MongoDB: The Complete Presentation

I had two great lectures regarding MongoDB and how to best utilize it in the last week. Therefore, I decided to share the presentation with you.

Key Presentation Topics
  1. MongoDB Background: Company, Customers and the roots of NoSQL
  2. Why more people are choosing MongoDB?
  3. Data Design for NoSQL
  4. MongoDB Installation
  5. Basic DDL and DML syntax
  6. MEAN (MongoDB, Express, Angular, node.js)
  7. Best Practices for MongoDB migration

P.S Don't miss my Hebrew Big Data Webinar at July 7th, 2014

Keep Performing,
Moshe Kaplan

May 13, 2014

6 Easy Steps to Configure MongoDB Replication Set

In this tutorial we'll create a 3 nodes cluster, where the first serves as a primary node, and second as a failover node and the third as an Arbiter

1. Setup Mongo and Set a Configuration File
In all the 3 servers adjust the configuration file /etc/mongod.conf:
#Select your replication set name
#Select the replication log size

Disable the bind_ip parameter to avoid binding to only interface

2. Restart All 3 mongod Daemons
> sudo service mongod restart

3. Create an Initial Configuration on the Primary
Login to the primary mongo and create an initial configuration. Please notice to use the private IP and not the loopback address (
> mongo
Primary> cfg = {"_id" : "[replication_set_name]", "members" : [{"_id" : 0,"host" : "[Primary_Host_IP]:27017"}]}
Primary> rs.initiate(cfg);

4. Add the Failover Instance to the Replication Set
Primary> rs.add("[Failover_Host_IP]:27017")

5. Add the Arbier Instance to the Replication Set
Primary> rs.addArb("[Arbiter_Host_IP]:27017")

6. Verify the Replication Set Status
Primary> rs.status()

Bottom Line
I wish every data cluster setup was as easy as a setup of a MongoDB replication set.

Keep Performing,
Moshe Kaplan

Apr 25, 2014

Consider Using SSL? Don't Forget Choosing the Right CDN for that!

Fact #1: More and More Sites are using SSL to Secure Their Users' Transactions
Everybody requires security and privacy these days.
If you don't believe it, take a look at Google, Facebook and Twitter. All them are using HTTPS and SSL to secure all their webpages and API calls including simple feed presentation and search page presentation.
And yes, this fact is still valid even in the post Heartbleed era.

Fact #2: Sites and Widgets are Required for Quicker than Light Loading Time
In the online business, time is money. And faster webpage load times worth a lot of money.

Fact #3: Webpages Secured by SSL have Poor Performance
When you surf for the first time to a website (or take a look at a widget), you are required to have several phases in order to view the website:
  1. Resolve the Site DNS.
  2. Call for the first Web page.
    1. Perform a TCP handshake.
    2. Perform a SSL handshake.
    3. Retrieve the page itself (+ encryption overhead)
  3. Call the embedded resources: images, CSS and JavaScript files (+ encryption overhead).
As you can see, the initial loading of a regular webpage is not short at all. Adding the SSL handshake to this process as well as the encryption and decryption and the overhead on the content, results in even longer times.

What Can be Done?
A common solution is choosing a SSL offloading device such as Radware's Alteon. This device will shorten the encryption and decryption times at the server side. However, it will not reduce the SSL handshake time or shorten the time needed to transfer the page encryption overhead.

The only way to shorten this time is shortening the round trip time between the users and your servers. If this is sounds like a CDN case study, you are right.

CDN is a Key Solution to Managing HTTPS Traffic
Modern CDN solutions support SSL termination at the edge. Therefore, the SSL handshake time can reduced from up to 1 second to several dozens of ms (see in the figures).
This is a major plus to the benefit of shortening the static files serving time by serving a cached copy from the CDN edge.

The good news are that this benefit is valid for both static files and dynamic calls.

Figure 1: Lightweight HTTP REST API Call: Local (Left waterfall) vs Remote (right waterfall), where local call was done from a server located at the same data center as the web server and the remote was done from a remote location with a 200ms round trip to the web server. As we can see most the time is due to the network round trips rather than server processing. Please neglect the initial DNS resolve time.
Figure 2: Same call this time using HTTPS. We can see that the original waiting time was split to two, while SSL connection time doubled it. We can see that in this case as well the SSL processing time at server is neglectable (22ms) while the round trip costs us about 420ms. Please avoid the DNS resolve time in this case as well.
CDN Selection for HTTPS Traffic Cases
While many CDN services support SSL offloading to their own domain (e.g, you probably would like to use your own domain name (e.g Therefore, you should verify that the CDN provider supports custom SSL certificates. Common cloud CDN providers such as AWS and MaxCDN are known to support it, while providers like Microsoft Azure don't. 

Bottom Line
CDN are a corner stone to every web scale deployment these days, and many times you will find they solve issues you were not expecting for them before.

Keep Performing,

Apr 10, 2014

Looking for PostgreSQL Performance Issues

As traffic goes up, even your PostgreSQL may become a bottleneck.
In this cases it is important to analyze the traffic and understand the usage pattern. The way you will be able to tune the system to meet the challenge.

Understand Usage Pattern at Peak Time
Use PostreSQL top project to get key usage patterns in real time:

  1. Current active SQL statements running
  2. Query plans
  3. Locks
  4. User tables and indexes statistics
Understand Overall Usage Pattern

To get a broad insight of PostgreSQL behavior use pgFouine. This tool analyzes the PostgreSQL logs and provides detailed usage patterns reports such as leading queries, duration, queries by type and queries patterns.

You can get some of these metrics by querying the pg_catalog schema (such as pg_stat_user_tables and pg_stat_user_indexes), and use log_statement to analyze all queries.

Enable Slow Queries

Probably #1 tool to eliminate performance issues:
  1. Add pg_stat_statements  to shared_preload_libraries statement @ postresql.conf
  2. Restart PostgreSQL daemon.
  3. use the pg_stat_statements view to pinpoint the bottlenecks:

SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;

Explain the Execution Plans
Use the Explain statement to analyze slow queries execution paths and eliminate them:

Bottom Line
Using these great tools, you can boost your PostgreSQL and meet the business challenges

Keep Performing,
Moshe Kaplan

Mar 30, 2014

How to Migrate from MySQL to MongoDB

In the last week I was working on a key project to migrate a BI platform from MySQL to MongoDB. The product that its development is headed by Yuval Leshem is gaining a major adaption and the company was facing a scale challenge.
We chosen MongoDB as the platform data infrastructure to support high data insert rate and scale data analysis.
Unlike many projects of this type, we accomplished the migration from plan to production in a week, mostly due to smart and simple plan.

I wanted to share with you some of lessons we learnt during the process:

Data Migration: Mongify
This tool provides a two steps solution to migrate your RDBMS (MySQL) to NoSQL (Mongo):
  1. Mapping database structure
  2. Export the data and import it according to the defined structure
Since it's an open source you can easily dive into the code and adjust it to your own business case. Moreover, the code is maintained by Andrew Kalek that is very cooperative.

Filter by Date Part (Day, Month, Year..)
If you are regular to using the DB date parts functions such as YEAR() and MONTH(), there are options to do it in MongoDB (see $where and aggregation framework). However, both require intensive IO. The best solution in this case is saving 3 (or more) more fields for each original field. These fields will include the relevant date part, and can be indexed for effective query:
  1. [original field]
  2. [original field]_[year part]
  3. [original field]_[month part]
  4. [original field]_[day part]
  5. [original field]_[hour part]
Default Values
MongoDB has no defined schema, so there are no default values as well. Therefore it's up to your data layer (or ORM) to take care of it
This is relevant to to default timestamp as well

Data Casting
Same case as with default values. You app should take care of it.
Please notice that there is a defined mapping from values and types that you can find at Mongify code.

Auto Numbers (1..N)
Same case here, but you may have to choose one of following ways:
  1. Shift your way of thinking of auto increment ids and start using MongoDB auto "_id"s a solution.
  2. You can generate the auto increment ids using a counters database and findAndModify (in this case I will recommend you having a special purpose database and 1:1 collection mapping to gain future releases granular locking). For details see the link on top.
Mongoose as an ORM
If you use node.js consider using Mongoose as your ORM, This one will solve many of your issues by adding structure to your schema. However, please notice that you may loose some flexibility.

Data Analysts
MongoDB is not SQL compliant, and you will have hard time with your data analysts. However, you can ease the change by using the following two methods:
  1. Introduce them to Query Mongo.
  2. Make sure your documents have no sub documents, if you don't have to. Elsewhere, transforming the data to tabulator view will require a major effort from them.
Avoid Normalizing Your Data
If you designed your data infrastructure as a non normalized structure, it will be much easier to move data to NoSQL. If your data is normalized, it is better to the app to take care of the data reconstruction.

Queries Results Limitation
MongoDB results are limited to a document size. If you need to query 200K+ records, you may need to page the data using skip and limit (or better, adding a filter based on the last limited row key value).

Bottom Line
Migration from MySQL to MongoDB requires some effort and a shift in your state of mind, but it can be done relatively fast using careful planning according to the steps defined before.

Keep Performing,
Moshe Kaplan

Mar 5, 2014

MySQL Indexing: Don't Forget to Have Enough Free Space

When you modify you indexes in MySQL (and especially in MyISAM), make sure that the free space on the disk that holds your tmpdir folder is larger than your largest index file.

Why We Need to Such a Large Free Space?
MySQL is using the tmpdir to copy the original index file to and "repair it" by sorting the data.

What Happens if We Don't Have Enough Space?
In this case MySQL will make it best to modify the index file based on the given space. The result is a very slow process (or never ending one) and poor results. If you will check the show processlist, you will find out the state "Repair by keycache" instead of "Repair by sorting"

What to Do?
Make sure you have enough free space (> largest index file) and that the tmpdir option file is located on this disk.

Bottom Line
Make sure you have enough free space to get best performance

Keep Performing,
Moshe Kaplan

Feb 21, 2014

When Should I Use MongoDB rather than MySQL (or other RDBMS): The Billing Example

NoSQL is a hot buzz in the air for a pretty long time (well, it not only a buzz anymore).
However, when should we really use it?

Best Practices for MongoDB
NoSQL products (and among them MongoDB) should be used to meet challenges. If you have one of the following challenges, you should consider MongoDB:

You Expect a High Write Load
MongoDB by default prefers high insert rate over transaction safety. If you need to load tons of data lines with a low business value for each one, MongoDB should fit. Don't do that with $1M transactions recording or at least in these cases do it with an extra safety.

You need High Availability in an Unreliable Environment (Cloud and Real Life)
Setting replicaSet (set of servers that act as Master-Slaves) is easy and fast. Moreover, recovery from a node (or a data center) failure is instant, safe and automatic

You need to Grow Big (and Shard Your Data)
Databases scaling is hard (a single MySQL table performance will degrade when crossing the 5-10GB per table). If you need to partition and shard your database, MongoDB has a built in easy solution for that.

Your Data is Location Based
MongoDB has built in spacial functions, so finding relevant data from specific locations is fast and accurate.

Your Data Set is Going to be Big (starting from 1GB) and Schema is Not Stable
Adding new columns to RDBMS can lock the entire database in some database, or create a major load and performance degradation in other. Usually it happens when table size is larger than 1GB (and can be major pain for a system like BillRun that is described bellow and has several TB in a single table). As MongoDB is schema-less, adding a new field, does not effect old rows (or documents) and will be instant. Other plus is that you do not need a DBA to modify your schema when application changes.

You Don't have a DBA
If you don't have a DBA, and you don't want to normalize your data and do joins, you should consider MongoDB. MongoDB is great for class persistence, as classes can be serialized to JSON and stored AS IS in MongoDB. Note: If you are expecting to go big, please notice that your will need to follow some best practices to avoid pitfalls

Real World Case Study: Billing
In the last ILMUG, Ofer Cohen presented BillRun, a next generation Open Source billing solution that utilizes MongoDB as its data store. This billing system runs in production in the fastest growing cellular operator in Israel, where it processes over 500M CDRs (call data records) each month. In his presentation Ofer presented how this system utilizes MongoDB advantages:
  1. Schema-less design enables rapid introduction of new CDR types to the system. It let BillRun keep the data store generic.
  2. Scale BillRun production site already manages several TB in a single table, w/o being limited by adding new fields or being limited by growth
  3. Rapid replicaSet enables meeting regulation with easy to setup multi data center DRP and HA solution.
  4. Sharding enables linear and scale out growth w/o running out of budget.
  5. With over 2,000/s CDR inserts, MongoDB architecture is great for a system that must support high insert load. Yet you can guarantee transactions with findAndModify (which is slower) and two-phase commit (application wise).
  6. Developer oriented queries, enable developers write a elegant queries.
  7. Location based is being utilized to analyze users usage and determining where to invest in cellular infrastructure.
Bottom Line
MongoDB is great tool, that should be used in the right scenarios to gain unfair advantage in your market. BillRun is a fine example for that.

Keep Performing,
Moshe Kaplan

Jan 13, 2014

Create Your Postgres Cluster in 6 Steps

Setting a Postgres cluster (or even a Master-Slave configuration) was not an easy task before version Postgres 9.0.

In the following lines you will find a short guideline that will help you create you Postgres HA solution:

1. Install postgres on both machines
sudo yum -y install postgresql-server
sudo service postgresql initdb
sudo service postgresql start
sudo sudo -u postgres psql

2. Stop the Postgres instance on both machines, delete the /var/lib/postgresql folder at the slave, and recover it by copy the files from the master.

3. Do initial configuration on both machines to enable logging:
sudo mkdir /var/log/postgresql/
sudo chown -R postgres:postgres /var/log/postgresql/
sudo vi /var/lib/pgsql9/data/postgresql.conf
> log_directory = '/var/log/postgresql'

4. Configure the Master and Slave to enable replication
sudo vi /var/lib/pgsql9/data/postgresql.conf @ Master (enable logging and listening)
> listen_addresses = '*'
> wal_level = hot_standby

> max_wal_senders = 3

sudo vi /var/lib/pgsql9/data/pg_hba.conf @ Master (enable access to the server, tune it, if your cluster is not in a secure environment):
> host  replication   all      trust
> host    all             all                  trust

sudo vi /var/lib/pgsql9/data/postgresql.conf @ Slave (turn on standby)
> hot_standby = on

sudo vi /var/lib/pgsql9/data/recovery.conf @ Slave  (turn on standby a new file)
> standby_mode = 'on'

> primary_conninfo = 'host='

5. Finally, start your slave, and then your master on both sides:
sudo service postgresql start

6. And check replication on by creating database at the Master, and check it at the slave
master> sudo sudo -u postgres psql
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
 a         | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |

slave>  sudo sudo -u postgres psql
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
 a         | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |

P.S If you are using a CentOS and yum (and not AWS), the following link will be useful for you.

Recovering from Cluster Crash (Thanks for Yariv Rachmani for contribution)
If you find the following errors in the your Postgres error:
> 2015-01-01 19:41:14.419 IST >LOG: started streaming WAL from primary at 0/86000000 on timeline 1
> 2015-01-01 19:41:14.420 IST >FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000086 has already been removed

In this case the quickest procedure to recover the slave:

  1. Stop the service: > service postgresql-9.3 stop
  2. Backup the data folder: > cp -prf /var/lib/pgsql/9.3/data /var/lib/pgsql/9.3/data-org
  3. Remove the data folder: > rm -rf data
  4. Copy the data folder from master: > pg_basebackup -h -D /var/lib/pgsql/9.3/data -U postgres -P -v -x
  5. Change permissions: > chown -Rf postgres:postgres data
  6. Copy the configuration file from the backup:> cp -p data-org/*.conf data/
  7. Start again the slave: service postgresql-9.3 start
  8. Verify the slave log file
Bottom Line
Postgres replication is not so complicated starting with Postgres 9.0. Now, it's your turn to keep your system uptime!

Keep Performing,
Moshe Kaplan


Intense Debate Comments

Ratings and Recommendations