Mar 30, 2014

How to Migrate from MySQL to MongoDB

In the last week I was working on a key project to migrate a BI platform from MySQL to MongoDB. The product that its development is headed by Yuval Leshem is gaining a major adaption and the company was facing a scale challenge.
We chosen MongoDB as the platform data infrastructure to support high data insert rate and scale data analysis.
Unlike many projects of this type, we accomplished the migration from plan to production in a week, mostly due to smart and simple plan.


I wanted to share with you some of lessons we learnt during the process:


Data Migration: Mongify
This tool provides a two steps solution to migrate your RDBMS (MySQL) to NoSQL (Mongo):
  1. Mapping database structure
  2. Export the data and import it according to the defined structure
Since it's an open source you can easily dive into the code and adjust it to your own business case. Moreover, the code is maintained by Andrew Kalek that is very cooperative.

Filter by Date Part (Day, Month, Year..)
If you are regular to using the DB date parts functions such as YEAR() and MONTH(), there are options to do it in MongoDB (see $where and aggregation framework). However, both require intensive IO. The best solution in this case is saving 3 (or more) more fields for each original field. These fields will include the relevant date part, and can be indexed for effective query:
  1. [original field]
  2. [original field]_[year part]
  3. [original field]_[month part]
  4. [original field]_[day part]
  5. [original field]_[hour part]
Default Values
MongoDB has no defined schema, so there are no default values as well. Therefore it's up to your data layer (or ORM) to take care of it
This is relevant to to default timestamp as well

Data Casting
Same case as with default values. You app should take care of it.
Please notice that there is a defined mapping from values and types that you can find at Mongify code.

Auto Numbers (1..N)
Same case here, but you may have to choose one of following ways:
  1. Shift your way of thinking of auto increment ids and start using MongoDB auto "_id"s a solution.
  2. You can generate the auto increment ids using a counters database and findAndModify (in this case I will recommend you having a special purpose database and 1:1 collection mapping to gain future releases granular locking). For details see the link on top.
Mongoose as an ORM
If you use node.js consider using Mongoose as your ORM, This one will solve many of your issues by adding structure to your schema. However, please notice that you may loose some flexibility.

Data Analysts
MongoDB is not SQL compliant, and you will have hard time with your data analysts. However, you can ease the change by using the following two methods:
  1. Introduce them to Query Mongo.
  2. Make sure your documents have no sub documents, if you don't have to. Elsewhere, transforming the data to tabulator view will require a major effort from them.
Avoid Normalizing Your Data
If you designed your data infrastructure as a non normalized structure, it will be much easier to move data to NoSQL. If your data is normalized, it is better to the app to take care of the data reconstruction.

Queries Results Limitation
MongoDB results are limited to a document size. If you need to query 200K+ records, you may need to page the data using skip and limit (or better, adding a filter based on the last limited row key value).

Bottom Line
Migration from MySQL to MongoDB requires some effort and a shift in your state of mind, but it can be done relatively fast using careful planning according to the steps defined before.

Keep Performing,
Moshe Kaplan

Mar 5, 2014

MySQL Indexing: Don't Forget to Have Enough Free Space

When you modify you indexes in MySQL (and especially in MyISAM), make sure that the free space on the disk that holds your tmpdir folder is larger than your largest index file.

Why We Need to Such a Large Free Space?
MySQL is using the tmpdir to copy the original index file to and "repair it" by sorting the data.

What Happens if We Don't Have Enough Space?
In this case MySQL will make it best to modify the index file based on the given space. The result is a very slow process (or never ending one) and poor results. If you will check the show processlist, you will find out the state "Repair by keycache" instead of "Repair by sorting"

What to Do?
Make sure you have enough free space (> largest index file) and that the tmpdir option file is located on this disk.

Bottom Line
Make sure you have enough free space to get best performance

Keep Performing,
Moshe Kaplan

ShareThis

Intense Debate Comments

Ratings and Recommendations