Apr 10, 2014

Looking for PostgreSQL Performance Issues

As traffic goes up, even your PostgreSQL may become a bottleneck.
In this cases it is important to analyze the traffic and understand the usage pattern. The way you will be able to tune the system to meet the challenge.

Understand Usage Pattern at Peak Time
Use PostreSQL top project to get key usage patterns in real time:

  1. Current active SQL statements running
  2. Query plans
  3. Locks
  4. User tables and indexes statistics
Understand Overall Usage Pattern

To get a broad insight of PostgreSQL behavior use pgFouine. This tool analyzes the PostgreSQL logs and provides detailed usage patterns reports such as leading queries, duration, queries by type and queries patterns.

You can get some of these metrics by querying the pg_catalog schema (such as pg_stat_user_tables and pg_stat_user_indexes), and use log_statement to analyze all queries.

Enable Slow Queries

Probably #1 tool to eliminate performance issues:
  1. Add pg_stat_statements  to shared_preload_libraries statement @ postresql.conf
  2. Restart PostgreSQL daemon.
  3. use the pg_stat_statements view to pinpoint the bottlenecks:

SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;

Explain the Execution Plans
Use the Explain statement to analyze slow queries execution paths and eliminate them:

Bottom Line
Using these great tools, you can boost your PostgreSQL and meet the business challenges

Keep Performing,
Moshe Kaplan

Mar 30, 2014

How to Migrate from MySQL to MongoDB

In the last week I was working on a key project to migrate a BI platform from MySQL to MongoDB. The product that its development is headed by Yuval Leshem is gaining a major adaption and the company was facing a scale challenge.
We chosen MongoDB as the platform data infrastructure to support high data insert rate and scale data analysis.
Unlike many projects of this type, we accomplished the migration from plan to production in a week, mostly due to smart and simple plan.

I wanted to share with you some of lessons we learnt during the process:

Data Migration: Mongify
This tool provides a two steps solution to migrate your RDBMS (MySQL) to NoSQL (Mongo):
  1. Mapping database structure
  2. Export the data and import it according to the defined structure
Since it's an open source you can easily dive into the code and adjust it to your own business case. Moreover, the code is maintained by Andrew Kalek that is very cooperative.

Filter by Date Part (Day, Month, Year..)
If you are regular to using the DB date parts functions such as YEAR() and MONTH(), there are options to do it in MongoDB (see $where and aggregation framework). However, both require intensive IO. The best solution in this case is saving 3 (or more) more fields for each original field. These fields will include the relevant date part, and can be indexed for effective query:
  1. [original field]
  2. [original field]_[year part]
  3. [original field]_[month part]
  4. [original field]_[day part]
  5. [original field]_[hour part]
Default Values
MongoDB has no defined schema, so there are no default values as well. Therefore it's up to your data layer (or ORM) to take care of it
This is relevant to to default timestamp as well

Data Casting
Same case as with default values. You app should take care of it.
Please notice that there is a defined mapping from values and types that you can find at Mongify code.

Auto Numbers (1..N)
Same case here, but you may have to choose one of following ways:
  1. Shift your way of thinking of auto increment ids and start using MongoDB auto "_id"s a solution.
  2. You can generate the auto increment ids using a counters database and findAndModify (in this case I will recommend you having a special purpose database and 1:1 collection mapping to gain future releases granular locking). For details see the link on top.
Mongoose as an ORM
If you use node.js consider using Mongoose as your ORM, This one will solve many of your issues by adding structure to your schema. However, please notice that you may loose some flexibility.

Data Analysts
MongoDB is not SQL compliant, and you will have hard time with your data analysts. However, you can ease the change by using the following two methods:
  1. Introduce them to Query Mongo.
  2. Make sure your documents have no sub documents, if you don't have to. Elsewhere, transforming the data to tabulator view will require a major effort from them.
Avoid Normalizing Your Data
If you designed your data infrastructure as a non normalized structure, it will be much easier to move data to NoSQL. If your data is normalized, it is better to the app to take care of the data reconstruction.

Queries Results Limitation
MongoDB results are limited to a document size. If you need to query 200K+ records, you may need to page the data using skip and limit (or better, adding a filter based on the last limited row key value).

Bottom Line
Migration from MySQL to MongoDB requires some effort and a shift in your state of mind, but it can be done relatively fast using careful planning according to the steps defined before.

Keep Performing,
Moshe Kaplan

Mar 5, 2014

MySQL Indexing: Don't Forget to Have Enough Free Space

When you modify you indexes in MySQL (and especially in MyISAM), make sure that the free space on the disk that holds your tmpdir folder is larger than your largest index file.

Why We Need to Such a Large Free Space?
MySQL is using the tmpdir to copy the original index file to and "repair it" by sorting the data.

What Happens if We Don't Have Enough Space?
In this case MySQL will make it best to modify the index file based on the given space. The result is a very slow process (or never ending one) and poor results. If you will check the show processlist, you will find out the state "Repair by keycache" instead of "Repair by sorting"

What to Do?
Make sure you have enough free space (> largest index file) and that the tmpdir option file is located on this disk.

Bottom Line
Make sure you have enough free space to get best performance

Keep Performing,
Moshe Kaplan

Feb 21, 2014

When Should I Use MongoDB rather than MySQL (or other RDBMS): The Billing Example

NoSQL is a hot buzz in the air for a pretty long time (well, it not only a buzz anymore).
However, when should we really use it?

Best Practices for MongoDB
NoSQL products (and among them MongoDB) should be used to meet challenges. If you have one of the following challenges, you should consider MongoDB:

You Expect a High Write Load
MongoDB by default prefers high insert rate over transaction safety. If you need to load tons of data lines with a low business value for each one, MongoDB should fit. Don't do that with $1M transactions recording or at least in these cases do it with an extra safety.

You need High Availability in an Unreliable Environment (Cloud and Real Life)
Setting replicaSet (set of servers that act as Master-Slaves) is easy and fast. Moreover, recovery from a node (or a data center) failure is instant, safe and automatic

You need to Grow Big (and Shard Your Data)
Databases scaling is hard (a single MySQL table performance will degrade when crossing the 5-10GB per table). If you need to partition and shard your database, MongoDB has a built in easy solution for that.

Your Data is Location Based
MongoDB has built in spacial functions, so finding relevant data from specific locations is fast and accurate.

Your Data Set is Going to be Big (starting from 1GB) and Schema is Not Stable
Adding new columns to RDBMS can lock the entire database in some database, or create a major load and performance degradation in other. Usually it happens when table size is larger than 1GB (and can be major pain for a system like BillRun that is described bellow and has several TB in a single table). As MongoDB is schema-less, adding a new field, does not effect old rows (or documents) and will be instant. Other plus is that you do not need a DBA to modify your schema when application changes.

You Don't have a DBA
If you don't have a DBA, and you don't want to normalize your data and do joins, you should consider MongoDB. MongoDB is great for class persistence, as classes can be serialized to JSON and stored AS IS in MongoDB. Note: If you are expecting to go big, please notice that your will need to follow some best practices to avoid pitfalls

Real World Case Study: Billing
In the last ILMUG, Ofer Cohen presented BillRun, a next generation Open Source billing solution that utilizes MongoDB as its data store. This billing system runs in production in the fastest growing cellular operator in Israel, where it processes over 500M CDRs (call data records) each month. In his presentation Ofer presented how this system utilizes MongoDB advantages:
  1. Schema-less design enables rapid introduction of new CDR types to the system. It let BillRun keep the data store generic.
  2. Scale BillRun production site already manages several TB in a single table, w/o being limited by adding new fields or being limited by growth
  3. Rapid replicaSet enables meeting regulation with easy to setup multi data center DRP and HA solution.
  4. Sharding enables linear and scale out growth w/o running out of budget.
  5. With over 2,000/s CDR inserts, MongoDB architecture is great for a system that must support high insert load. Yet you can guarantee transactions with findAndModify (which is slower) and two-phase commit (application wise).
  6. Developer oriented queries, enable developers write a elegant queries.
  7. Location based is being utilized to analyze users usage and determining where to invest in cellular infrastructure.
Bottom Line
MongoDB is great tool, that should be used in the right scenarios to gain unfair advantage in your market. BillRun is a fine example for that.

Keep Performing,
Moshe Kaplan

Jan 13, 2014

Create Your Postgres Cluster in 6 Steps

Setting a Postgres cluster (or even a Master-Slave configuration) was not an easy task before version Postgres 9.0.

In the following lines you will find a short guideline that will help you create you Postgres HA solution:

1. Install postgres on both machines
sudo yum -y install postgresql-server
sudo service postgresql initdb
sudo service postgresql start
sudo sudo -u postgres psql

2. Stop the Postgres instance on both machines, delete the /var/lib/postgresql folder at the slave, and recover it by copy the files from the master.

3. Do initial configuration on both machines to enable logging:
sudo mkdir /var/log/postgresql/
sudo chown -R postgres:postgres /var/log/postgresql/
sudo vi /var/lib/pgsql9/data/postgresql.conf
> log_directory = '/var/log/postgresql'

4. Configure the Master and Slave to enable replication
sudo vi /var/lib/pgsql9/data/postgresql.conf @ Master (enable logging and listening)
> listen_addresses = '*'
> wal_level = hot_standby

> max_wal_senders = 3

sudo vi /var/lib/pgsql9/data/pg_hba.conf @ Master (enable access to the server, tune it, if your cluster is not in a secure environment):
> host  replication   all      trust
> host    all             all                  trust

sudo vi /var/lib/pgsql9/data/postgresql.conf @ Slave (turn on standby)
> hot_standby = on

sudo vi /var/lib/pgsql9/data/recovery.conf @ Slave  (turn on standby a new file)
> standby_mode = 'on'

> primary_conninfo = 'host='

5. Finally, start your slave, and then your master on both sides:
sudo service postgresql start

6. And check replication on by creating database at the Master, and check it at the slave
master> sudo sudo -u postgres psql
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
 a         | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |

slave>  sudo sudo -u postgres psql
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
 a         | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |

P.S If you are using a CentOS and yum (and not AWS), the following link will be useful for you.

Bottom Line
Postgres replication is not so complicated starting with Postgres 9.0. Now, it's your turn to keep your system uptime!

Keep Performing,
Moshe Kaplan

Jan 7, 2014

Monitoring Your SQL Server w/ Monitis

One of the nicest monitoring products around is Monitis.com
If you are using MySQL, you will probably find out that some of the thresholds there should be modified or tuned (or just removed, since they are not critical to your day to day operations):

Key Cache Writes Rate
Actual value: 0
Proposed range: 75-100
What should be done: If you are not using MyISAM, ignore this one. If you do use it, modify the key_buffer_size variable and increase it to accommodate the index in the memory.

Indexes Usage
Actual value: 5
Proposed range: 80-100
What should be done: Indexes usage is a key indicator to understand if you write your queries in a way that utilizes your indexes and if you defined correct indexes to support your questions.
Enable your slow log query and examine your queries. 

Key Cache Hit Rate
Actual value: N/A
Proposed range: 95-100
What should be done: Seems that this is not a critical variable, since you must check it is relevant in your case. If you believe that your system business case should use most data out of cache and value is low, modify the key_buffer (MyISAM) or innodb_buffer_pool_size (InnoDB).

Table Cache Hit Rate
Actual value: N/A
Proposed range: 85-100
What should be done: See the one above

Bottom Line
Monitoring system is critical component of critical systems. However, in most cases you should verify that the default configuration matches your business case (and if it is the case, take care the alerts!).

Keep Performing,
Moshe Kaplan

MySQL and Application Servers Connection Pool

As we discussed in the past, MySQL is very lightweight in terms of connections creation and closure overhead, and therefore classic LAMP installations rarely use it.
However, some application frameworks such as Java and Ruby tend to use it and in these cases it is recommended to adjust some configurations in order to avoid long connections cuts.

The Pattern
Since MySQL default is 8 hours timeout, this pattern is usually when connection pools are configured with high number of connections (and low utilization) or when pattern usage is limited to defined hours (daytime for example).

How to Detect this Pattern?
By using MySQLTuner, you will find out that Connections Aborted parameter is too high (I've seen cases with as high as 80%).

What Should We Do?
Adjust the connections timeout, in order to avoid connection pool connections cutting:
#1 day timeout
SET GLOBAL wait_timeout = 86400
SET GLOBAL interactive_timeout = 86400

or in the my.cnf
wait_timeout = 86400
interactive_timeout = 86400

Bottom Line
Connection pools are not a native case for MySQL, but it does not mean your cannot support it wisely.

Keep Performing,
Moshe Kaplan


Intense Debate Comments

Ratings and Recommendations