Mar 28, 2011

MySQL Partitioning. One step before Sharding.

Many of you may consider Sharding to meet your large scale database demands. However, in some cases it seems too early since you may not have the needed bandwidth in the current phase or you just consider to adapt a Sharding COTS solution like Scalebase and you need a solution for mid term.

I don't have time so what should I do?
The answer is simple: you may choose MySQL built-in mechanism named MySQL Partitioning. 
This mechanism partitions your table using one of 4 different rule types. It main solution is keeping all this process hidden from the applicative DBA and the software engineer. Try one of this methods:

  1. Range Partitioning: partition by column values. Most fit for known ranges like archive by date range.
  2. List Partitioning: similar to the above based on discrete values. Most fit for archive by years.
  3. Hash Partitioning: partition by hash function defined by user. Most fit when data ranges are that are unknown, but you know the spread of them. Should be used when you partition a table according to a foreign key. Please notice that in any case, the hashed key must be included in table primary key.
  4. Key Partitioning: similar to Hash, but this time the MySQL Server takes care of the function. Most fit that the data distribution is uniform. Partition key should include the primary key.
How do I declare that?
Using DDL. just add few more lines to your CREATE TABLE statement and you will receive the partitioning. For example to split by year use the following statement:

    PARTITION p2009 VALUES IN (2009),
    PARTITION p2010 VALUES IN (2010),
    PARTITION p2011 VALUES IN (2011),
    PARTITION p2012 VALUES IN (2012)

What do I get?
Mayflower has presented very nice numbers with 200M records table partitioned to 400 parts. They reached 4000 INSERT/UPDATE and 6000 SELECT statements per second.

Bottom Line
If don't have a lot of time and you need an instant solution, go for Partitioning

Keep Performing,

Mar 22, 2011

MySQL DRP Done Right

When your systems are up and running and money is getting into the bank, it is the best time to ask yourself: What if my system fails? What if my data center (yes, even your cloud one) goes down?

The Instant DRP Solution
The instant solution is preparing a backup data center. Then you should make sure that the web and application servers will be ready to take over when needed. Yet, what about taking care of your data? How will the database be managed across two different data centers?

Many cloud, web, mobile and service providers base their systems on MySQL. Therefore, this post is dedicated to best MySQL DRP strategies..

Fail and Then Manually Recover
The most simple and easy to implement method is the Master-Slave schema. This method supports:
  1. "Master": A single database server that serves for write and read.
  2. "Slaves" One or more servers that serves for read only.
When the master fails, you can promote one of the slaves to become the master.
This method can also be used to reduce stress on the master and enable various partitioning modes.

Fail and Recover Automatically
A more complex way to implement, but faster to recover (if nothing goes wrong) is implementing the Master-Master schema. In this schema both servers serve as masters and recover when one fails. Please notice that this method is a little bit tricky and you should avoid conflicts since the conflict resolution is poor:

  1. You should need to either avoid auto increment fields or make sure that only one server is an active master at a time.
  2. You should avoid same record modified by both parties.

Do It the Cloud Way
Cloud providers propose these days cloud based database (PaaS). These solutions remove the database management headache from your task list. Amazon for example provides the Rational Database Service (Amazon RDS). RDS provides a high available managed MySQL instance that supports instant recovery to any point in time. This service is more expensive than having your own MySQL instance, but if you are a medium shop, it may be the best offer for you.

Bottom Line
Select your MySQL DRP method based on your resources, recovery needs and the effort you willing to allocate. Choose Master-Slave for novice, Master-Master for experts and the cloud way to avoid the headache.

Keep Performing,
Moshe Kaplan  Follow MosheKaplan on Twitter

P.S feel free to contact for DRP and MySQL consulting

Mar 6, 2011

Faster Website? CDN Might be the Answer!

Why Our Japanese Users are Angry?
A common question when you setup a new SaaS or just a fancy website is "Why do my remote users have hard time with my application?" while when I'm in a physical location that is close to my hosting provider I get a relatively good user experience.

It's the TCP/IP Doctor Watson
Window Size is a significant parameter in the TCP/IP stack. This parameter controls how much data can be sent by either party before receiving an ACK from the other. If your Window Size is set to 8KB and the round trip (RTT) between you and the server is 20ms, you could get 400KB/s (1sec/20ms*8KB) or ~4Mbps. 

What If You Were in the Other Side of the Globe?
Well, if your RTT is more like 250ms, you could hardly get 32KB/sec (1sec/250ms*8KB), meaning that you could download a 1MB image in 30sec... Not something to proudly report back home.

CDN. The Question is: How Can I Solve This?
CDN (Content Delivery Network) helps bringing web content closer to the end user. This way it maximizes the throughput controlled by TCP/IP Window size.

How Does CDN Work?
  1. The CDN provider places many edge servers around the world (as many as dozens of thousands of locations in some cases).
  2. The CDN provider hosts your DNS record in a DNS that supports Anycast. Anycast enables the same DNS record to point to different locations around the world based on the network topology.
  3. The end user is directed to the closest edge server based on the Anycast DNS.
  4. The edge server tries to serve the file if it is cached on it (images, JavaScript files, static HTML, Streaming Video or just a static copy of your favorite news site are good examples).
  5. If the requested file is dynamic or not cached, the edge server requests the file from the origin server using a large TCP window size. This way the 230ms RTT (250ms-20ms) is done in a much larger Window Size (say 64KB) and the throughput will be much larger.
Bottom Line
CDN can help you better your end user experience, use it when needed.

If you need assistance, don't hesitate to communicate,
Keep Performing,
Moshe KaplanFollow MosheKaplan on Twitter


Intense Debate Comments

Ratings and Recommendations