Jan 27, 2015

12 Ways to Boost Your Elasticsearch Performance

The ELK (Elasticsearch, logstash, Kibana) stack is amazing.
In no time you can create a fully functional analytics service from data collection to dashboard presentation.

But what happens at scale? How can make sure this blazing fast solution keeps serving your business team even when your data includes hundreds of millions of data points and more.

What to Focus on?
Elasticsearch performs two major tasks:
  1. Data load and indexing which is CPU intensive.
  2. Search and queries that is Memory intensive.
You should design your system to match you business case pattern.

Step 1: Keep your version up to date
Elasticsearch is a relatively young tool, and the team delivers new features and fixes in a rapid way, so make sure you keep with the latest versions.

Step 2: Tune Your Memory
Elasticsearch memory utilization should be about 50% of your machine. It should be configured using the $ES_HEAP_SIZE environment variable to this number (2G for example): export ES_HEAP_SIZE=2G
Note: Probably this method should not work, as the init.d script overrides it... edit your /etc/init.d/elasticsearch with the ES_HEAP_SIZE=2g parameter.

Step 3: Select Your Storage
Disks are crucial when your data is larger then your memory. Choose local SSD disks. They will cost less and perform better.

Step 4: Stripe Your Data
Use path.data and path.logs to stripe your data and logs on multiple disks to gain more IOPS.

Step 5: Prepare for Index Merging:
Index merging is probably the most frustrating process in Elasticsearch. It's required to keep your system performance in the long run, but can end in relatively short high resource utilization. Elasticsearch protects itself to merge up to 20MB/s. If it serves as your back office system, you can disable the index.store.throttle.type settings to none

Step 6: Plan for Bulk Loading
Like any other data solution, you should data in bulks when possible to fasten your load and minimize resource utilization. This is the reason you should check Bulk API.

Step 7: Optimize Your Index
Run optimize on your index when it is stable (for example after a daily load) to verify best performance

Step 8: Enlarge the File Handler Limit
Like other data solutions, Elasticsearch utilizes a high number of file handlers. Make sure to add the following settings to /etc/security/limits.conf:
*     soft    nofile          64000
*     hard    nofile          64000

Step 9: Make RAM Space for Your Indexes
Elasticsearch is optimized to clusters w/ over 10GB RAM as its default room for indexes is 10% of its memory. Since the best practice is having at least 512MB for the index buffer size, if your system is so large, make sure you add the following configuration to: /etc/elasticsearch/elasticsearch.yml
indices.memory.index_buffer_size=512M

Step 10: Change Mappings
Elasticsearch by default has some data mapping that may be avoided  in your case to save disk space, memory and boost performance:
  1. The _source field that stores the original data
  2. The _all field combines all fields to a single one for special search for any

Step 11: Add Monitoring
You can either choose Marvel, the ELK management tool with the Kibana look that is part of the Enterprise package or make your own using open source solutions or hosted solutions like New Relic.

Step 12: Sharding
It none working, start sharding and adding nodes to your system.

Bottom Line
Elasticsearch is an amazing tool and with the right configuration it can keep serving your analytics needs even in the scale of billions of events.

Keep Performing,
Moshe Kaplan

Jan 16, 2015

Offloading SSL using AWS ELB

If you are using AWS elastic load balancer to scale your system, you may find that it is a good solution to offload SSL termination from your servers.

Why Should You Offload SSL Termination?
HTTPS is an encrypted protocol, and encryption required high CPU utilization to perform the needed mathematical computations.
Since most web applications are CPU bounded, you should avoid processing SSL at your servers.

Why AWS Elastic Load Balancer (or Any other LB) Is a Great Candidate?
In order to perform load balancing, the load balancer must decrypt the traffic and read its content. This is done by placing your certificate on the load balancer.
If you consider the network between your LB and your servers to be secure, you should prefer to avoid re-encryption of the traffic, and keep it clear.

How Can I Make Sure Traffic is Actually Secured?
In some cases, you want all your users to use HTTPS as an encrypted channel in order to keep your users privacy and avoid eavesdropping and injections.
In these cases you want to catch traffic that did not use HTTPS before being terminated in the LB and redirect it to HTTPS. This can be done by evaluating by the X-Forwarded-Proto server field in your .htaccess or Apache configuration:
RewriteEngine On
RewriteCond %{HTTP:X-Forwarded-Proto} !https [NC]

RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

Bottom Line
A careful design can help you get more out of your web servers

Keep Performing,
Moshe Kaplan

Nov 29, 2014

MySQL Replication Over Slow Links/High Latency

MySQL replication is considered to be efficient and usually changes in the master server are performed in the slaves within a single second. 
However, if you suffer from a replication that fails to close the gap, there are two main reasons for it: 

  1. Slave Disk Issue: as replication is single threaded per database, usually the slave lags behind due to disk latency when implementing the changes. In this case you should consider using SSD to accelerate the process.
  2. Low Bandwidth/High Latency Networking: in case where the two servers are located on remote locations (high latency case) or there is a low bandwidth between the servers, we should focus on minimizing the traffic between the servers using one (or both) of the following methods:
    1. Using statement based replication: Row based replication creates a SQL statement per each changed row in the database. Statement based replication is records the actual SQL statement sent by the application. Usually statement based replication is much more efficient from log size aspect. However, you should be aware that it may not work correctly if you use UPDATE ... LIMIT 1 for example.
    2. Compressing the traffic: MySQL supports log replication compression using the slave_compressed_protocol parameter. This method will reduce the traffic between the servers by up to 80%. However, compression is compute intensive, so you should be aware of some extra CPU utilization (that is usually not an issue in databases). This parameter should be enabled on both servers:
      1. Dynamically from the MySQL command line:SET GLOBAL slave_compressed_protocol = 1;
      2. In the MySQL configuration file:#compress master-slave communication
        slave_compressed_protocol = 1
Bottom Line
Understand why your replication lags behind and use the right method to solve it. Yes, it is that easy.

Keep Performing,

Nov 22, 2014

Analyzing Twitter Streams in Real Time using Spark

One of the most interesting (and some people say not working) features in Apache Spark is the ability to analyze the Twitter stream in real time.

DZone just released earlier this week the new Spark RefCard. Since I made a peer review of it, it is a good time to discuss this topic.

Tuning an Out of a Box Solution
Spark provides an out of the box example that with some tuning, you will get the top ten trending Twitter tags every 10 and 60 seconds.

  1. Create a new Twitter App or use your existing app credentials at Twitter Apps.
  2. Download and install Java, Scala and Spark
  3. Adjust the environment variables :
    1. Scala home
      export SCALA_HOME=/usr/lib/scala
    2. PATH to run scala
      export PATH=$PATH:$SCALA_HOME/bin
    3. Add the location of spark-streaming-twitter_2.10-1.0.0.jar,  twitter4j-core-3.0.3.jar and twitter4j-stream-3.0.3.jar to CLASSPATH
      export CLASSPATH=$CLASSPATH:/root/spark/lib/twitter4j/
  4. Run the code after tuning some parameters:
    1. Get into the spark foldercd /var/lib/spark/spark-1.0.1/
    2. If you are running on a single core machine (or you want to just make sure you will get results and not just "WARN BlockManager: Block input-0-XXXXXXXX already exists on this machine; not re-adding it") change the ./bin/run-example code:
      sudo sed -i 's/local\[\*\]/local\[2\]/g' *.txt
    3. Run the example (please remember that you should write the class name, including the streaming., and avoid placing the path, the scala extension or any other fancy stuff:sudo ./bin/run-example streaming.TwitterPopularTags 
  5. The result will be shown after several seconds:Popular topics in last 60 seconds (194 total):
    #MTVStars (42 tweets)
    #NashsNewVideo (9 tweets)
    #IShipKarma (6 tweets)
    #SledgehammerSaturday (6 tweets)
    #NoKiam (5 tweets)
    #mufc's (3 tweets)
    #gameinsight (3 tweets)

Bottom Line
Spark is an amazing platform, with some little adjustments you will be able to enjoy it in a few minutes 

Keep Performing,
Moshe Kaplan

Aug 25, 2014

MySQL Installation on Ubuntu

It seems that MySQL was created to be installed on RHEL/CentOS/Fedora...
But what if you are actually using Ubuntu?
Well, this is the purpose of the following tutorial

  1. Trace all old MySQL installations on the servers (yes, some vendors are delivering baseline images with some old MySQL versions)
    sudo dpkg --get-selections | grep -v deinstall | grep -i mysql
    sudo apt-get --purge remove libdbd-mysql-perl libmysqlclient18 mysql-client mysql-client-5.5 mysql-client-core-5.5 mysql-common mysql-server mysql-server-5.5 mysql-server-core-5.5 php5-mysql
  2. Get the MySQL repository
    sudo wget -O mysql.deb http://dev.mysql.com/get/mysql-apt-config_0.2.1-1ubuntu14.04_all.deb
  3. Install the repository
    sudo dpkg -i mysql.deb
    sudo apt-get update
  4. Install the mysql package (you may need to select the right version and select a root password)
    sudo apt-get -y install mysql-server mysql-client
  5. If you have a pre-configured my.cnf, get it and replace the current my.cnf file. Then you should restart the MySQL. I would recommend you to keep your my.cnf in a code repository like git to manage the versionssudo apt-get -y install git
    sudo git clone wherever tmp && git mv tmp/.git . && rm -rf tmp
    sudo service mysql restart
  6. Now it time to recover your database from your backup (if you have one) and back to business...
Keep Performing,

Aug 19, 2014

Bringing Redis to an Enterprise Level Security

Little Security. Big Problems.
Redis is designed (as most early NoSQL products) as a product that should be used in secured environment (meaning little security measures are built in).

The only security measure Redis supports is authentication that is passed as clear text (definitely not a best security best practice).
UPDATE: Itamar Haber from RedisLabs referred me to an SSL communication encryption using stunnel or spiped


Big Problems. Great Solutions
If you love Redis (and w/ 60K Set and Get Ops/Sec on commodity hardware there is no reason you won't love it), and if you must have an enterprise grade solution, you can take one of the following approaches: 
  1. Implement a web layer in front of it (that can support SSL, encrypted authentication, logging and all other fancy stuff). A great example for this is Webdis with a built in solution. Off course, there is a performance penalty stick to the extra layer (and some issues as Redis still can be accessed directly).
  2. Save the data encrypted in Redis (encrypt data before the SET operation by the app servers, and decrypt it after the GET operation by consumers). This way, communication is not needed to be encrypted and hackers or malicious users can do little harm, even they are access the Redis directly as the Redis data store is encrypted.

Bottom Line
Even sensitive products that lack of basic security measurements, can be brought to enterprise level with the right design in mind.

Keep Performing,
Moshe Kaplan

Jul 26, 2014

Validating Your Disk Performance

Databases and data stores in general (yes, including the leading the leading NoSQL products such as MongoDB and Cassandra) love IO. The more RAM and SSD they have, the better performance you will get...

If you are short of RAM and get a poor performance, it may be a good time to verify your disks performance and see if they do match your needs.

How much IO do you need?
To analyze your needs you should use iostat. This utility will show for each mounted disk the actual number of blocks that are read from disk and are written to disk.
If the numbers are constantly higher than 1,000 blocks/sec, you should consider improving your application design or move from magnetic disks to SSD. In the case below for example the system has a write issue.

> iostat 2
Linux 2.6.32-358.11.1.el6.x86_64 (example.com)         07/26/2014      _x86_64_        (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           2.02    0.00    0.92    0.01    0.00   97.04

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda              92.88       599.18      3920.00  547971262 3584968522
sdb               0.00         0.00         0.00       3104          0

What are Your Disks Limits?
If you are not sure how much can you get from your disks, Bonnie++ is a great tool to explore your current limits. 

Installing Bonnie++
> wget http://www.coker.com.au/bonnie++/bonnie++-1.03e.tgz
> tar xvfz bonnie++-1.03e.tgz
> cd bonnie++-1.03e
> make
> make install

Running Bonnie++
Select your target disk, the size of written data in MB (the -s flag), the number of repeats (the -n flag) and the server RAM size in MB (the -r flag). Please notice that you should define a write to disk value that is larger than the actual RAM value in order to avoid false results:
> sudo ./bonnie++ -u root -d /tmp -s 1500 -n 3 -r 750

Detecting an issues
As you will be able to see in the following lines Bonnie++ is very useful in detecting disk performance issue and failed disks (see Server C that has a very poor write performance)
Server A (SAS Disks) Results
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
serverA        100M 52626  99 +++++ +++ +++++ +++ 61374  99 +++++ +++ +++++ +++
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

Server B (SSD Disks) Results
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
serverB        100M 61864  99 +++++ +++ +++++ +++ 58338  99 +++++ +++ +++++ +++
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  3 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++

Server C (Failed SSD Disks) Results
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
serverC        100M  3892  99 104779  99 98847  99  3865  99 +++++ +++ +++++ +++
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  3 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
responder5.spd.co.il,100M,3892,99,104779,99,98847,99,3865,99,+++++,+++,+++++,+++,3,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

Bottom Line
When you plan your next move, you should verify your servers can handle it. These tools will help you verify at least your disk can take care of the your future tasks.

Keep Performing,
Moshe Kaplan

ShareThis

Intense Debate Comments

Ratings and Recommendations