Nov 22, 2014

Analyzing Twitter Streams in Real Time using Spark

One of the most interesting (and some people say not working) features in Apache Spark is the ability to analyze the Twitter stream in real time.

DZone just released earlier this week the new Spark RefCard. Since I made a peer review of it, it is a good time to discuss this topic.

Tuning an Out of a Box Solution
Spark provides an out of the box example that with some tuning, you will get the top ten trending Twitter tags every 10 and 60 seconds.

  1. Create a new Twitter App or use your existing app credentials at Twitter Apps.
  2. Download and install Java, Scala and Spark
  3. Adjust the environment variables :
    1. Scala home
      export SCALA_HOME=/usr/lib/scala
    2. PATH to run scala
      export PATH=$PATH:$SCALA_HOME/bin
    3. Add the location of spark-streaming-twitter_2.10-1.0.0.jar,  twitter4j-core-3.0.3.jar and twitter4j-stream-3.0.3.jar to CLASSPATH
      export CLASSPATH=$CLASSPATH:/root/spark/lib/twitter4j/
  4. Run the code after tuning some parameters:
    1. Get into the spark foldercd /var/lib/spark/spark-1.0.1/
    2. If you are running on a single core machine (or you want to just make sure you will get results and not just "WARN BlockManager: Block input-0-XXXXXXXX already exists on this machine; not re-adding it") change the ./bin/run-example code:
      sudo sed -i 's/local\[\*\]/local\[2\]/g' *.txt
    3. Run the example (please remember that you should write the class name, including the streaming., and avoid placing the path, the scala extension or any other fancy stuff:sudo ./bin/run-example streaming.TwitterPopularTags 
  5. The result will be shown after several seconds:Popular topics in last 60 seconds (194 total):
    #MTVStars (42 tweets)
    #NashsNewVideo (9 tweets)
    #IShipKarma (6 tweets)
    #SledgehammerSaturday (6 tweets)
    #NoKiam (5 tweets)
    #mufc's (3 tweets)
    #gameinsight (3 tweets)

Bottom Line
Spark is an amazing platform, with some little adjustments you will be able to enjoy it in a few minutes 

Keep Performing,
Moshe Kaplan

Aug 25, 2014

MySQL Installation on Ubuntu

It seems that MySQL was created to be installed on RHEL/CentOS/Fedora...
But what if you are actually using Ubuntu?
Well, this is the purpose of the following tutorial

  1. Trace all old MySQL installations on the servers (yes, some vendors are delivering baseline images with some old MySQL versions)
    sudo dpkg --get-selections | grep -v deinstall | grep -i mysql
    sudo apt-get --purge remove libdbd-mysql-perl libmysqlclient18 mysql-client mysql-client-5.5 mysql-client-core-5.5 mysql-common mysql-server mysql-server-5.5 mysql-server-core-5.5 php5-mysql
  2. Get the MySQL repository
    sudo wget -O mysql.deb http://dev.mysql.com/get/mysql-apt-config_0.2.1-1ubuntu14.04_all.deb
  3. Install the repository
    sudo dpkg -i mysql.deb
    sudo apt-get update
  4. Install the mysql package (you may need to select the right version and select a root password)
    sudo apt-get -y install mysql-server mysql-client
  5. If you have a pre-configured my.cnf, get it and replace the current my.cnf file. Then you should restart the MySQL. I would recommend you to keep your my.cnf in a code repository like git to manage the versionssudo apt-get -y install git
    sudo git clone wherever tmp && git mv tmp/.git . && rm -rf tmp
    sudo service mysql restart
  6. Now it time to recover your database from your backup (if you have one) and back to business...
Keep Performing,

Aug 19, 2014

Bringing Redis to an Enterprise Level Security

Little Security. Big Problems.
Redis is designed (as most early NoSQL products) as a product that should be used in secured environment (meaning little security measures are built in).

The only security measure Redis supports is authentication that is passed as clear text (definitely not a best security best practice).
UPDATE: Itamar Haber from RedisLabs referred me to an SSL communication encryption using stunnel or spiped


Big Problems. Great Solutions
If you love Redis (and w/ 60K Set and Get Ops/Sec on commodity hardware there is no reason you won't love it), and if you must have an enterprise grade solution, you can take one of the following approaches: 
  1. Implement a web layer in front of it (that can support SSL, encrypted authentication, logging and all other fancy stuff). A great example for this is Webdis with a built in solution. Off course, there is a performance penalty stick to the extra layer (and some issues as Redis still can be accessed directly).
  2. Save the data encrypted in Redis (encrypt data before the SET operation by the app servers, and decrypt it after the GET operation by consumers). This way, communication is not needed to be encrypted and hackers or malicious users can do little harm, even they are access the Redis directly as the Redis data store is encrypted.

Bottom Line
Even sensitive products that lack of basic security measurements, can be brought to enterprise level with the right design in mind.

Keep Performing,
Moshe Kaplan

Jul 26, 2014

Validating Your Disk Performance

Databases and data stores in general (yes, including the leading the leading NoSQL products such as MongoDB and Cassandra) love IO. The more RAM and SSD they have, the better performance you will get...

If you are short of RAM and get a poor performance, it may be a good time to verify your disks performance and see if they do match your needs.

How much IO do you need?
To analyze your needs you should use iostat. This utility will show for each mounted disk the actual number of blocks that are read from disk and are written to disk.
If the numbers are constantly higher than 1,000 blocks/sec, you should consider improving your application design or move from magnetic disks to SSD. In the case below for example the system has a write issue.

> iostat 2
Linux 2.6.32-358.11.1.el6.x86_64 (example.com)         07/26/2014      _x86_64_        (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           2.02    0.00    0.92    0.01    0.00   97.04

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda              92.88       599.18      3920.00  547971262 3584968522
sdb               0.00         0.00         0.00       3104          0

What are Your Disks Limits?
If you are not sure how much can you get from your disks, Bonnie++ is a great tool to explore your current limits. 

Installing Bonnie++
> wget http://www.coker.com.au/bonnie++/bonnie++-1.03e.tgz
> tar xvfz bonnie++-1.03e.tgz
> cd bonnie++-1.03e
> make
> make install

Running Bonnie++
Select your target disk, the size of written data in MB (the -s flag), the number of repeats (the -n flag) and the server RAM size in MB (the -r flag). Please notice that you should define a write to disk value that is larger than the actual RAM value in order to avoid false results:
> sudo ./bonnie++ -u root -d /tmp -s 1500 -n 3 -r 750

Detecting an issues
As you will be able to see in the following lines Bonnie++ is very useful in detecting disk performance issue and failed disks (see Server C that has a very poor write performance)
Server A (SAS Disks) Results
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
serverA        100M 52626  99 +++++ +++ +++++ +++ 61374  99 +++++ +++ +++++ +++
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

Server B (SSD Disks) Results
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
serverB        100M 61864  99 +++++ +++ +++++ +++ 58338  99 +++++ +++ +++++ +++
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  3 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++

Server C (Failed SSD Disks) Results
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
serverC        100M  3892  99 104779  99 98847  99  3865  99 +++++ +++ +++++ +++
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  3 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
responder5.spd.co.il,100M,3892,99,104779,99,98847,99,3865,99,+++++,+++,+++++,+++,3,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

Bottom Line
When you plan your next move, you should verify your servers can handle it. These tools will help you verify at least your disk can take care of the your future tasks.

Keep Performing,
Moshe Kaplan

Jul 24, 2014

9 steps to Implement Offline Backup at Azure IaaS: Backup a MySQL to Azure Storage

A good backup is probably something you will thank for when s%$t hits the fan.
If you chose MySQL as your data infrastructure and Microsoft Azure as your cloud infrastructure, you will probably thank this procedure (that I actually turned into a a script).

Chosen Products
Our first task is choosing the backup and automation products. I selected the following two:
  1. Percona XtraBackup: a leading backup product by Percona. The product creates an hot backup of the database that is equivalent to disk copy. This method is much faster to backup and recover than mysqldump. It also support increment backup.
  2. Azure SDK Tools Xplat: a node.js based SDK that enables command line interface to the various Azure services including Azure Storage.
Backup Implementation Guide
  1. Install Percona XtraBackup
    sudo wget http://www.percona.com/redir/downloads/XtraBackup/XtraBackup-2.2.3/binary/debian/precise/x86_64/percona-xtrabackup_2.2.3-4982-1.precise_amd64.deb
    sudo dpkg -i percona-xtrabackup_2.2.3-4982-1.precise_amd64.deb
    sudo apt-get update
    sudo apt-get install -f
    sudo dpkg -i percona-xtrabackup_2.2.3-4982-1.precise_amd64.deb

    The CentOS
    sudo rpm -Uhv http://www.percona.com/downloads/percona-release/percona-release-0.0-1.x86_64.rpm
    sudo yum -y install percona-xtrabackup.x86_64
  2. Install Azure SDK Tools Xplat
    sudo apt-get updatesudo apt-get -y install nodejs python-software-propertiessudo add-apt-repository ppa:chris-lea/node.jssudo wget https://npmjs.org/install.sh --no-check-certificate | sudo shsudo apt-get install npmsudo npm config set registry http://registry.npmjs.org/sudo npm install -g azure-clisudo ln -s /usr/bin/nodejs /usr/bin/node
  3. Install a backup procedure
    1. Get a publish settings file from Azure (can be done from the console).
    2. Get account name and the matching base64 key from the Azure console.
    3. Import the publish setting filesudo azure account import /opt/mysqlbackup/mysqlbackup.publishsettings
    4. Create a Storage Container
      sudo azure storage container create --container container_name -a account_name -k base64_account_key
  4. Run the backup
    1. The xtrabackup way
      1. Run a full backup and prepare it twice to make it ready for recovery
        sudo xtrabackup --backup
        sudo xtrabackup --prepare
        sudo xtrabackup --prepare
      2. Add the frm files and mysql database to your backup
        sudo chmod -R +r /var/lib/mysql/
        sudo cp -R /var/lib/mysql/myql/* /mnt/backup/mysql/
        sudo cp -R /var/lib/mysql/yourdb/*.frm /mnt/backup/yourdb/
    2. The innobackupex way:
      1. Run full backup and prepare (adjust the memory usage for your case):
        sudo rm -rf /mnt/datadrive/mysqlbackup/*
        sudo innobackupex --user=DBUSER --password=DBUSERPASS 
        /mnt/datadrive/mysqlbackup/
        sudo innobackupex --apply-log /mnt/datadrive/mysqlbackup/ --use-memory=500M
  5. Tar the files into a unique daily name_now=$(date +"%Y_%m_%d")
    _file="/mnt/backup/mysqlbackup$_now.tar.gz"
    tar cvzf "$_file" /mnt/datadrive/mysqlbackup/
  6. Copy the folder to Azure Storage using
  7. azure storage blob upload -f "$_file" -q --container container_name -a account_name -k base64_account_key
  8. Create a cron that will run it daily:
    > sudo cron -e
    * 0 * * * /opt/mysqlbackup/daily.backup.sh >/dev/null 2>&1
Recovery Guide
  1. Bring back the files from the Azure storage to /mnt/backup/
    sudo cd /mnt/backup
    sudo azure storage blob download --container container_name -a account_name -k base64_account_key -b $file_name
  2. Uncompress the files
    sudo tar xvfz $file_name
  3. Copy the files to your data folder (/var/lib/mysql) after shutting down the MySQL
    1. The xtrabackup waysudo service mysql stopsudo rsync -avrP /mnt/backup/ /var/lib/mysql/
    2. The innobackupex way:
      sudo service mysql stopsudo rm -rf /var/lib/mysql/*sudo innobackupex --copy-back /mnt/datadrive/mysqlbackup/sudo chown -R mysql:mysql /var/lib/mysql
  4. Verify the folder permissions
    sudo chown -R mysql:mysql /var/lib/mysql
  5. Restart the MySQL and verify everything is working.
    sudo service mysql start
Bottom Line 
It may be tough. It may be resource and time consuming. However, you must have good recovery process to keep your a$$...

Keep Performing,

Jul 17, 2014

How to Disable MySQL Binlog

@ MySQL, clean the the old bin log files 
SQL> SHOW MASTER STATUS\G
Take the File name
SQL> PURGE BINARY LOGS TO 'file';

@ The /etc/my.cnf
Comment out log-bin
> #log-bin = ....

Restart the mysql
> sudo service mysql restart

Keep Performing,

Jul 14, 2014

Keep Your Website Up and Running During an Upgrade

All of us want to have our website up and running.
However, upgrading a software version requires downtime (even if it minimal).
What should we do?

The answer is simple: follow the following process and gain a highly available system based on load balancing and session off loading, as well as a no downtime procedure to upgrade your system.

Keep Your Site Up and Running @ AWS

  1. Place more than a single web instance behind a load balancer. The Elastic Load Balancer (ELB) will be fine for that.
  2. Make sure you are using session offloading in your system implementation.
  3. Install elbcli on your machine sudo apt-get -y install elbcli
  4. Create a credentials file /var/keys/aws_credential_file and secure it sudo chmod 600 /var/keys/aws_credential_file
    AWSAccessKeyId=AKIAblablabla
    AWSSecretKey=blablabla
    blablablablablabla
  5. Upgrade the servers one after the another.
  6. Before upgrading a server, take it out of the ELB
    sudo elb-deregister-instances-from-lb MyLoadBalancer --instances i-4e05f721  --aws-credential-file=/var/keys/aws_credential_file
  7. And after completing the server upgrade take it back in.
    sudo elb-register-instances-with-lb MyLoadBalancer --instances i-4e05f721  --aws-credential-file=/var/keys/aws_credential_file
Bottom Line
A simple process and careful design will keep you system up and running.

Keep Performing,

ShareThis

Intense Debate Comments

Ratings and Recommendations