Scale Hacking: Cloud Computing, Software and System Performance: 2013

Dec 23, 2013

Can OpenStack Object Store be a Base for a Video CDN?

Video CDN are amazing from technical aspect as we are talking about high scale systems with some unique business cases.
I would like to share with you some design aspects regarding these systems.

The Video CDN Case Studies
Video CDN includes two main case studies:

VOD Case Study: This is a long tail/high throughput scenario where you need a high capacity disks, where only a small portion of it will be used extensively. In order to create a cost effective solution you should have:

High capacity storage system with low IOPS needs. Servers with 24-36 2-3TB SATA disks will provide a up to 100TB raw storage with a tag price of $15K.
Replication and auto failover mechanism that can distribute content between several servers and can save us from using expensive RAID and Cluster solutions.
Caching/Proxy mechanism that will serve the head of the long tail from the memory.

Live Broadcast Case Study: This is a No Storage/high throughput scenario where you actually don't need any persistent storage (if a server fails, as soon as its get up again, the data will be no longer relevant). In order to create a cost effective solution you should have:

No significant storage.
High capacity RAM that should be sized according to:

The number of channels you are going to serve.
The number of resolutions your going to support (most relevant when you plan to support handhelds and not just widescreens).
The amount of time you are going to store (no more that 5 min are needed in case of live, and no more than 4 hours in case of start over).

In memory rapid replication (or ramdisk based) mechanism, that will replicate the incoming video to several machines.
HTTP interface to serve video chunks to end users.

Serving Static Content rather than Dynamic
Modern Video Encoding systems (such as Google's Widevine) support "Encrypt Once, Use Many", where the content is encrypted once, and decryption keys are distributed to secured clients based on a need to know basis.

Why OpenStack Swift/Object Store?
In short, OpenStack Swift is the OSS equivalent for Amazon propriety AWS S3: "Simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web."

OpenStack Swift/Object Store Benefits

Open stack is OSS and therefore easy to evaluate.
Active large scale deployments including RackSpace and Comcast.
Built in content distribution method that let you distribute the load between multiple servers based on rsync.
High availability based on data replication between multiple instances. On a server failure, you just need to take it out of the array and replace with a new one, while other servers keep serving users.
This mechanism help you avoid premium hardware such IO controllers and RAID mechanism.
Built in HA and DRP based on 5 independent zones.
Built in web server, that enables you serving static content as well as HTTP based video streams from the server itself, rather than implementing high end SAN.
Built in reverse proxy service that minimizes IO and maximizes throughput, based on Python and Memcache.
Built in authentication service
Target pricing of $0.4 per 1M servings and $0.055/GB per month if we take AWS as a benchmark.

OpenStack Object Store Architecture
The OpenStack Object Store architecture is well described in two layers:

The logical: Accounts (paying customers), Containers (folders) and Objects (blobs).
The physical: Zones (Independent clusters), Partitions (of data items), Rings (mapping between URLs and partitions and locations on disks) and Proxies.

Key Concepts

Account is actually an independent tenant, as it has its own data store (implemented by SQLite).
Replication is done based on large blocks, quorum and MD5.
Write to Disk before Expose to Users: When files are uploaded, they are first committed to disk at least at two zones, and then database is updated for availability (so don't expect sub second response for a write).

System Sizing

Before testing the system you should be familiar with some key sizing metrics obtained by Korea Telecom:

Object Store Server Sizing: High capacity storage (36-48 2-3TB SATA drivers that will provide us up to 100TB per server), memory to cache the head of the long tail (24-48GB RAM), 2x1Gbps Eth to support ~500 concurrent requests for long tail request. A single high end CPU should do the work.
Proxy Server Sizing: Little storage (500GB SATA disk will do the work), memory to cache the head of the long tail (24 RAM), 2x10Gbps Eth to support ~5000 concurrent requests to support the head of the long tail requests.
Switches: you will need a nice backbone for this system. In order to avoid a backbone that is too large, splitting the system to several clusters is recommended.
Load Balancing: in order to avoid high end LB, you should use DNS LB, where frequent calls to the DNS are neglect-able relatively to the Media streaming

The Fast Lane: How to Start?
OpenStack Object Store is probably cost effective when looking for large installations as you may need at least 5 physical servers for object and containers store and another 2 for proxies.
However, you can check the solution based on a single server installation (SAIO):

If you take the fast lane and AWS is the fast lane to your POC, feel free to use the following tips:

In the initial installation, some packages will miss in yum so:

sudo easy_install eventlet
sudo easy_install dnspython
sudo easy_install netifaces
sudo easy_install pastedeploy

No need to start the rsync service (just reboot the machine).
Start the service using sudo ./bin/startmain
Test the service using the supload bash script to stimulate a client.

Working with the Web Services
There are 3 main ways to work with Swift web services:

AWS tools, as Swift is compliant with AWS S3.
HTTP calls as it is based on HTTP.
Swift client that streams your major needs.

Working with Swift Client

In the following example we assume a given user test.tester was defined with the password testing, and data is served over the proxy at port 8080

Get statistics:
swift -A http://test.example.com:8080/auth/v1.0 -U test:tester -K testing stat

Upload a file to the videos container:

sudo swift -A http://test.example.com:8080/auth/v1.0 -U test:tester -K testing upload videos ./demo.wvm

Provide read only permissions (Please notice that r: is stands to referral domain, so specifying any other * can help you save bandwidth and minimize content stealing):

sudo swift -A http://test.example.com:8080/auth/v1.0 -U test:tester -K testing post videos -r '.r:*'

Download the file (where AUTH_test is the user account, videos is the containter and anonymous access was provided as detailed below):
curl http://test.example.com:8080/v1.0/AUTH_test/videos/demo.wvm

Public Access
In order to implement a read only public access your will need to take care of the following items

User Management
Define anonymous access: delay_auth_decision = 1
Configure folder ACL
Enable delayed authentication at the proxy configuration (/etc/swift/proxy-server.conf):

[filter:authtoken]
paste.filter_factory = keystone.middleware.auth_token:filter_factory
# Delaying the auth decision is required to support token-less
# usage for anonymous referrers (‘.r:*’).
delay_auth_decision = 1

Working with Direct HTTP Calls

Get user/pwd

curl -v -H 'X-Storage-User: test:tester' -H 'X-Storage-Pass: testing' http://test.example.com:8080/auth/v1.0

> X-Storage-Url: http://127.0.0.1:8080/v1/AUTH_test

> X-Auth-Token: AUTH_tk551e69a150f4439abf6789409f98a047

> Content-Type: text/html; charset=UTF-8

> X-Storage-Token: AUTH_tk551e69a150f4439abf6789409f98a047

Upload file
curl –X PUT -i \

-H "X-Auth-Token: AUTH_tk26748f1d294343eab28d882a61395f2d" \
-T /tmp/a.txt \
https://storage.swiftdrive.com/v1/CF_xer7_343/dogs/JingleRocky.jpg

Bottom Line
OpenStack Object Store (Swift) is exciting tool for anyone who is working with large scale system and especially when talking about CDNs.

Keep Performing,
Moshe Kaplan

Dec 14, 2013

Do You Really Need NoSQL and Big Data Solutions?

Big Data and NoSQL are the biggest buzz around...
Yet, are they the right solutions for your project?

Are You Eligible for Big Data?
As a rule of thumb, if your database is smaller than 100GB and your biggest table is less than 100M rows, you should avoid seeking Big Data solutions. In that case, make sure you well utilize your current RDBMS investments.

When Should You Choose RDBMS?
There are several other reasons to stick with RDBMS (yes, we are talking about MySQL, SQL Server, Oracle and other):

You must meet compliance and security procedures (cases: PCI compliance).
You need complex reporting based on joins between several tables.
You need transactions (cases: financial transactions).
You have established data analysts group that cannot be trained to other syntax.

When NoSQL Solutions Should Fit?
There are several high reasons to select NoSQL solution. Check if your case is eligible for it:

You are a full stack developer and just look for a persistence storage (cases: blogging system, multi choice exams).
You need a quick response from your storage solution based on a key value store (cases: algo trading and online stock exchanges bidding: DSP).
You must always return an answer, even if it's not the most updated one (cases: social networks, content management).
You need to provide a good enough answer rather the most accurate (cases: search engine).
Your data size is too big to be transferred over network (even over a 80Gbps Infiniband). In this cases a better approach is distributing the computation (cases: analytics, statistics).

Bottom Line
NoSQL solutions have expanded your toolbox. Now, you need to focus on selecting the right tool for your business case.

Keep Performing,
Moshe Kaplan

Dec 9, 2013

JAVA Production Systems Profiling Done Right!

If you are facing a Java system performance issue in production, and JProfiler is not the right tool for it, probably JMX monitoring using the VisualVM will do the work for you.

Technical
JMX usage from a remote machine can be frustrating. Therefore, please make sure that:

Your hostname is included in the /etc/hosts

Get host name using hostname
Add the host name after 127.0.0.1 in /etc/hosts

JMX is binded to the external IP:

Verify 127.0.0.1 is not presented at: netstat -na | grep 1099
If it does presented, add to your java command: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Djava.rmi.server.hostname=

If everything is Okay, you will be able to run VisualVM from a remote machine and connect to the remote server.

VisualVM

Now, that you have your VisualVM up and running there are some items you should take a look at:

General CPU and memory graphs.
Sampler that enables you taking snapshots.
Snapshot analysis that enables you a hotspot presentation as well as deep.

Bottom Lin

My recommendation is to have snapshot of the process and then look at the hotspots tab for major calls with actual long CPU time. You should focus on these items.

Keep Performing,

Moshe Kaplan

Dec 3, 2013

Is There a (Good) Solution for SQL Server HA @ Azure?

Comment: I do not see Windows Azure SQL Database as a feasible solution for a firm that expects its business to scale. The reason is simple: You can not use a component in your system that its replacement will require a long downtime (yes, we are talking about hours if you will have a significant database size). The only way to migrate from Windows Azure SQL Database is to export its data and import it on a regular instance, and it's not acceptable when you have a significant traffic.

High Availability
The requirement for high availability is common: you don't want downtime, as downtime mean less business and it hurts your business image.

The Azure Catch
Azure SQL Server VM is just like having a SQL Server on a regular VM.
VM maintenance includes two layers: 1) maintaining the VM (installing patches, hardening...) and 2) doing the same to the host underneath.
A common large size private and public cloud operation usually include an auto fail over, so when a host is having maintenance or unfortunately fails, the system automatically migrate the running VMs to another host(s) w/o stopping them. You could find this behavior in VMWare VMotion and at Amazon EC2 that runs over XEN.
Well... this is not the case at Microsoft Azure. When Microsoft updates its hosts, don't expect your instances to be available (and yes the downtime may take dozens of minutes and it is not controlled by you). This is an acceptable practice when dealing with web and application servers (place several instances behind a LB and use a queue mechanism to deal with it). However, it is not good one when you deal with databases it's can be a major issue.

The Solution: Have a Master-Master Configuration
As you may understood, a master-slave solution is not acceptable in this case, and therefore, you will need to avoid Log Shipping (although it can be used various other scenarios).
Therefore, we were left with two solutions:

Mirroring:
This was a solution for HA architectures.
However "This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature. Use AlwaysOn Availability Groups instead."

http://technet.microsoft.com/en-us/library/ms189852.aspx

AlwaysOn Availability Groups
This solution is described by MS as the "enterprise-level alternative to database mirroring. Introduced in SQL Server 2012".
However, "The non-RFC-compliant DHCP service in Windows Azure can cause the creation of certain WSFC cluster configurations to fail, due to the cluster network name being assigned a duplicate IP address (the same IP address as one of the cluster nodes). This is an issue when you implement AlwaysOn Availability Groups, which depends on the WSFC feature."
http://technet.microsoft.com/en-us/library/hh510230.aspx

The Second Catch
According to our analysis it seems that both Mirroring (end of life) and AlwaysOn (severe bugs due to DHCP) are not recommended, so we actually left w/o good HA solution, and therefore with no good MS data store solution for the Azure environment.
We tried to get answers from Microsoft stuff, but we did not get good ones.

Bottom Line
When evaluating Azure as a cloud platform for your needs, you should consider your data solution and how it fits your needs. In this case you may need to consider some open source solutions such as MySQL, Cassandra and MongoDB on a Linux VM, instead of going the MS default stack.

Keep Performing,
Moshe Kaplan

Nov 9, 2013

Azure Monitoring. What to Choose?

You are going live with your Azure system and need to know what is happening there?
Cannot sleep at night as your system crashes from time to time?
Need to know whatever your fast growing system will go out of resources in the near future?
If so, this post is for you!

Decisions
In general you have 3 main options:

Open Source Software that is installed by you and managed by you.
Azure built in monitoring that may require some extra charging and tailoring to your business case.
3rd Party SaaS offering that provides you an End to End solution.

The Open Source Way

Major OSS monitoring tools are available and widely used in the industry. Some of the leading solutions are Zabbix, Nagios and Ganglia.
Software is available w/ no additional cost
You will need to install the servers by yourself (and of course paying for the instances and manage the data).
Implementing an end to end monitoring (inc. user experience) may require extra servers installation on another data center or using a 3rd party service.
Other downsides are well described by CopperEgg.

The Azure Monitoring Way

If you are already invested in Azure, it's built-in, so why not using it?
The service provides nice utilization graphs, and some minimal alerting system (limited to 10 rules). However, there are no pre configured templates, and some metrics are with additional charging.
Basic website availability is provided using web site monitoring.
Yet, there are several gaps such as VM monitoring, SQL Azure and Azure Storage.
The overall impression is that many features are available, but it's far from being a monitoring product, and you will have to do a lot of integration work by yourself..

3rd Party Monitoring as s Service

CopperEgg: Nice company, the their website could look better. More important, Their features looks pretty similar to the OSS solutions with an integrated website response time and availability service. Pricing: $7-9/server|service/month.
AzureWatch: a very similar product to CopperEgg both in features and pricing (although their landing page looks much better). Pricing: $5-8/server|service/month.
New Relic:Probably the most intersting product around, not only it provides an end to end monitoring, but it also analyzes your performance bottlenecks. Their product is being integrated to your code and provides you with production profiling and alerts you regarding any exception. Pricing: standard version that includes the other providers offers is free. Code level diagnostics and inner level profiling will cost you $25-200/server|service/month

Bottom Line

If you are ready to invest time to get your own solution, the OSS products with Azure integration will probably be your preferred solution. Otherwise, I will recommend you to choose New Relic, that seems to provide the best value in the market for your needs.

Keep Performing

Moshe Kaplan

Nov 5, 2013

Apache Session Persistancy Using MongoDB

Your app is booming, you need more web servers and you need to serve users and keep their user experience. When you had a single server you just used session for that, but now, How do you keep sessions across multiple web servers?

Using a session stickiness load balancer is a good solution, but what happens when a web server needs a restart?

Session Off Loading

Many of you are familiar with the concept of session off loading. The common configuration for it is using Memcached to store the session object, so it is accessible from all web servers. However, this solution lacks the features of high availability by replication and persistence.

The Next Step: High Availablability

Offloading web servers sessions to MongoDB looks like a great solution: key-value store, lazy persistency and built in replication and auto master failover.

Step by Step Solution for LAMP Session Off Loading using MongoDB

Install MongoDB (preferably in a replica set with at least 3 nodes)
Install MongoDB driver for PHP:
Configure the php.ini: extension=mongo.so
Download and integrate Apache session handler implementation using MongoDB.
Configure your MongoDB setting in the module:

Set the list of nodes in the connectionString setting: 'connectionString' => 'mongodb://SERVER1:27017, mongodb://SERVER2:27017',
Do the same to the list of servers (see at the bottom of this post)
Configure the cookie_domain (or just place a null string there to support all): 'cookie_domain' => ''
Don't forget to enable replica set if you are using one: 'replicaSet' => true,

Add the MongoSession.php to your server and require it: require_once('MongoSession.php');
Replace the session_start() function with $session = new MongoSession();

And that's all. Now you can continue using the session objects as you used to create new great features:

if (!isset($_SESSION['views'])) $_SESSION['views'] = 0;

$_SESSION['views'] = $_SESSION['views'] + 1;

echo $_SESSION['views'];

Bottom Line

Smart work can tackle every scale issue you are facing off...

Keep Performing,

Moshe Kaplan

Appendix:

'servers' => array(
array(
'host' => 'SERVER1',
'port' => Mongo::DEFAULT_PORT,
'username' => null,
'password' => null,
'persistent' => false
),
array(
'host' => 'SERVER2',
'port' => Mongo::DEFAULT_PORT,
'username' => null,
'password' => null,
'persistent' => false

)

Oct 13, 2013

MongoDB Index Tuning and Dex

One of the first things when your system is rolled into production (or better when you have your acceptance tests) is checking the system performance performance and tune it.
When we talk about databases, we usually talk about index tuning, and MongoDB is not different.

How to Detect the Slow Queries?
First enable you MongoDB slow queries log. It's easy, just run db.setProfilingLevel(1, 1) from the command line.

How to Analyze the Slow Queries?
Well, you can use the built in queries (after all the profiling is saved at a MongoDB collection). However, Dex, a new tool from MongoLab, can help you shorten the time to index...
It's a Python based product that can be easily installed and run and provides a list of recommended indexes based on MongoLab best practices.

How to Install and Run?

Install python: yum -y install python
Install pip
Install Dex using pip: pip install dex
Create a log file: touch mongodb.log
Run it: sudo dex -p -v -f mongodb.log mongodb://user:password@serverDNS:serverPort/database. Don't forget to use the various flags to get errors in case of wrong configuration.

Should We Just Copy and Paste the New Indexes?
As you may know every added index results in a performance degragation when INSERT, DELETE and UPDATE statement are performed. Therefore, we should carefuly select the indexes to be added:

We should check that the new indexes do not overlap existing indexes. If there is an overlap, we should remove/unify the indexes definitions (use the .db.collection.getIndexes() method to retrieve the data.
We should check that the queries are frequently used in order to avoid saving few seconds once a month while doing a SELECT, and pay for that at peak times INSERTs.

Bottom Line
MongoDB queries tuning is easier than ever: Just Tune It!

Keep Performing,
Moshe Kaplan

Sep 15, 2013

Puppet, Configuration Management and How to Get into Production Faster...

Puppet or Chef?

You must take a look at the varios comparissons between the tools, but actually we chosen Puppet since it becomes an industry choice (we verified it by searching Linkedin).

What Puppet Version to choose?

I always like to explore the community editions, but PuppetLabs has a very nice offer for their Enterprise Edition (10 free nodes). Feel free to decide by yourself.

How to Install?

You can compile the source, but repositories are probably the easiest way around.

The CentOS 6.X easiest way is: $ sudo rpm -ivh http://yum.puppetlabs.com/el/6/products/i386/puppetlabs-release-6-7.noarch.rpm

How does It Actually Work?

Central Server/Master Repository: the place where defintions are defined. Please notice that some best practices recommend that you actually define the central server as another node in the system.
Nodes: these are your operational servers that gets configuration procedures from the central server.
Providers: perform the process on each machine, create systems abstraction (different OS distros for example) from the user.
Puppet Modules: these are product configuration definitions that can be installed on your server. For example the definition of Riverbed Stringray that we'll discuss later.
Server Configuration files: these are the configurations of each server (for example your web server). These files includes the selection of production that will be implemented and what configuration decisions were taken. For example if you install a loadbalancer on your server: what is the LB algorithm and what nodes are behind the LB.

How to Add a New Product to your Server?

Just add to the node ( in testServer this case) the product definition (class {'stingray':... in this case) and define the various configuration options (stingray::new_cluster in this case):

node 'testServer' {

class {'stingray':

accept_license => 'accept'

}

stingray::new_cluster{'new_cluster':

}

stingray::pool{'Northern Lights':

nodes => ['192.168.22.121:80', '192.168.22.122:80'],

algorithm => 'Least Connections',

persistence => 'NL Persistence',

monitors => 'NL Monitor'

}

How to implement Elastic Hosts (and server groups w/ the same function)?
If you want to implement elastic hosts that scale to traffic you should use MCollective to define the server groups

Do you need more information?
I found following PuppetLabs presentation to be very useful:

Bottom Line
Deployment of large systems is easier today, and will be even easier in the near future. You just need to choose the right tools.

Keep Performing,
Moshe Kaplan

Sep 12, 2013

MySQL Multi Master Replication

Why Multi Master Replication?
Master-Slave configuration is a great solution when your system is read bound. But what happens if you want to support high write loads, while keeping the option to select the data using a single SQL statement?

Multi Master replication is the solution for that. However, there are several methods in the market for that, and when choosing the right method, you should carefully define your needs: synchronous method or async one, row based or statement based (traffic between servers) and if time delayed replication is required (for quick recovery from delete for example).

The Multi Master Replication Options
I just kicked the first Israeli MySQL User Group in association with Wix this week, and I wanted to share with you Michael Naumov's presentation from our first event.

In his talk, Michael presented four methods to support Multi Master replication: MySQL 5.6 Native, NDB, Tungsten and Galera.

Bottom Line
Careful selection of the right MySQL Multi Master replication can save you production issues and can boost your system. Now all left is to define what is needed and what is the matching solution for that.

Keep Performing,
Moshe Kaplan

Aug 28, 2013

Your Storage Probably Effects Your System Performance

One of my clients had started getting some performance alerts in their monitoring systems: "IO is too high".
Probably this not something you will be glad to have.

In a quick analysis we found out that the alerts and the high IO originated from servers that were installed in a new data center.
While the actual CPU utilization devoted to IO wait at the old data center was around 25%, in the new data center it was about 75%.

Who is to Blame?
In the new data center NetApp 2240c was chosen as a storage appliance, while in the old IBM V7000 unified was used. Both systems had SAS disks so we didn't expected a major difference between the two. Yet, it was something worth to explore.

Measurement
In order to verify the source we made a read/write performance benchmark to both systems by running the following commands:

Write: dd if=/dev/zero of=/tmp/outfile count=512 bs=1024k
Read: dd if=/tmp/outfile of=/dev/null bs=4096k

UPDATE I: You should also try btest or IOMeter as suggested by Yuval Kashtan
UPDATE II: when using dd, you should better use dd if=/dev/urandom of=/tmp/outfile.txt bs=2048000 count=100 that actually uses random input and just allocate spaces with nulls

Results
On the NetApp 2240 we got 0.62GB/s write rate and 2.0GB/s read rate (in site #2)
On the IBM V7000 unified we got 0.57GB/s write rate and 2.1GB/s read rate (in site #2)
On the IBM V7000 unified we got 1.1GB/s write rate and 3.4GB/s read rate (in site #1)
~~That is almost a 100% boost when we used the IBM system!~~

Bottom Line
When selecting and migrating between storage appliances pay attention to their performance. Otherwise, you may tackle this differences in production. However, differences should be inspect in the same environment. In our case something that was seemed like a storage issue turned into a VM/OS configuration or network issue (exact cause is still under investigation).

Keep Performing,
Moshe Kaplan

Aug 16, 2013

Azure Production Issues, Not a nice thing to share with your friends...

Last night I got a call from a close friend.
"Our production SQL Server VM at Azure is down, and we cannot provide service to clients".
A short analysis issued that we are in a deep trouble:

The server status was: Stopped (Failed to start).
Repeated tries to start the server resulted in Starting... and back to the fail message.
Changing instance configuration as proposed in various forums and blogs was resulted in the same fail message.
Microsoft claims that everything is Okay with its data centers.
Checking the Azure storage container found out that the specific VHD disk was not updated since the server failure.

Since getting back to cold backup would cause losing too much data, we had to restore somehow the failed server.

The chances were against us. Yet, lucky us, we could do that by restoring the database files from the VHD (VM disk) file that was available at Azure storage.

How to Recover from the Stopped (Failed to start) VM Machine?

Start a new instance in the same availability set (that way you can continue using the same DNS name, instead of also deploying a new version of the app servers).
Attach a new large disk to the instance (the failed server disk was 127GB, make sure the allocated disk is larger).
Start the new machine.
Format the disk as a new drive.
Get to your Azure account and download the VHD file from Azure storage. Make sure you download it to the right disk. We found out that the download process takes several hours even when the blob storage is in the same data center as the VM.
Mount the VHD file you downloaded as a new disk.
Extract the database and log files from the new disk and attach them to the new SQL Server instance.

Other recommendations:

Keep your backup files updated and in a safe place.
Keep your database data and log files out of the system disk, so you could easily attach them to other servers.

Bottom Line

When the going gets tough, the tough get going

Keep Performing,

Moshe Kaplan

Jul 16, 2013

Detecting Performance Bottlenecks in Apache httpd Server

“a problem well put is half solved.” ― John Dewey

One of the most important things to well detect a performance issues in a LAMP system is understand the actual processes that causing the issues.

What should we do?
One way to start with is properly configure the Apache logging in a way that we will get fine details about every request performance.

Key Apache httpd Parameters
%B Size of response in bytes, excluding HTTP headers.
%D The time taken to serve the request, in microseconds.
%{VARNAME}e The contents of the environment variable VARNAME.
%T The time taken to serve the request, in seconds.
%X Connection status when response is completed

How to Configure?
Modify your httpd.conf file according to the following article:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" proxy

How do we Analyze it?
Apache log files can be huge (a common case in systems that have some performance challenges).
A useful tool for analyzing the Apache logs, which Cnaan Aviv introduced to me, is GoAccess.
This tool generates reports, statistics and detects errors.

Bottom Line
When done right, solving performance issues is not a black magic. It just need to be done using a well defined method and the right tools.

Keep Performing,
Moshe Kaplan

Jun 28, 2013

Changing MySQL Binary Log Files Location to Another Directory

What is the Default?
Usually in most installations, binary log files are located in the MySQL default directory (/var/lib/mysql) just next to the data files.

Why Should I Move the Binary Logs to Another Directory?
Each data modification (INSERT, UPDATE, DELETE...) and data definition (ALTER, ADD, DROP...) statement that you perform in your server are recorded in the Log files.
Therefore, each time you make any of these statements, you actually update both your data files and your log files. The result is high IO utilization that is focused on a specific disk area.
A common recommendation in the database field is to separate these files to two different disks in order to get a better performance.

How to Perform it?

Change the log-bin variable in the my.cnf to log-bin=/path/to/new/directory/mysql-bin
Purge as many files as you can (PURGE BINLOG...) in order to minimize the number of moved files (see stop 4).
Stop the master (service mysql stop).
Move the files to the new directory: mv /var/lib/mysql/mysql-bin.* /path/to/new/directory
Start the master again (service mysql start).

Bottom Line
Few steps and your server is ready for more traffic and data.

Keep Performing,
Moshe Kaplan

Jun 25, 2013

MongoDB and Java

You can find below some hints for initial Java and MongoDB integration

Take a Look at the Requirements

MongoDB
MongoDB-Java-Driver
JDK. If you want use JDK 1.6 or newer, you will get an error like this one: "DBObject cannot be resolved to a type"

Installing Java

yum -y install java-1.7.0-openjdk

yum -y install java-1.7.0-openjdk-devel

Getting MongoDB driver

Compiling Java

javac -d . *.java

java -cp . com/example/mbeans/Main

Define MongoDB Headers in the Code

import com.mongodb.MongoClient;

import com.mongodb.MongoException;

import com.mongodb.WriteConcern;

import com.mongodb.DB;

import com.mongodb.DBCollection;

import com.mongodb.BasicDBObject;

import com.mongodb.DBObject;

import com.mongodb.DBCursor;

import com.mongodb.ServerAddress;

import com.mongodb.*;

Connect to MongoDB

DB _db;

public void init() {

try {

System.out.println("Connecting to mongo...");

MongoClient mongoClient = new MongoClient("127.0.0.1" , 27017);

_db = mongoClient.getDB("display");

System.out.println("Connected to mongo...");

} catch (Exception e) {

System.out.println("Failed Connecting Mongo...");

}

Query the Database (Get the Number of Connections)

CommandResult stats = _db.command("serverStatus");

return Integer.valueOf((((DBObject)(stats.get("connections"))).get("current")).toString());

Bottom Line

Java and MongoDB integration is not too difficult, you just need to do the right thinks right...

Keep Performing,

Moshe Kaplan

Jun 7, 2013

DZone's Definitive Guide to Cloud Providers is here!

I was a part of the team that was contributing to the DZone's Definitive Guide to Cloud Providers, and it's finally here!
If you consider what cloud solution is best for you, take a look at this guide.

Keep Performing,
Moshe Kaplan

May 26, 2013

How Much RAM Should You Have for MongoDB?

Why RAM is So Important?
RAM is much faster than hard disks. For example, we gained X7 performance boost when we used MySQL In Memory engine rather than InnoDB engine.

Is It Only For Query Caching?
Most of us are familiar with query caching, when results of a query are saved in cache in case it will be called again.
However, RAM can be used to store a copy of your database in the RAM as well. This enables you getting best performance even when you do queries that were not done before.

So How Much RAM Do I Need?
My Recommendation that your system RAM should be larger than MongoDB data files + index files: SizeOf(RAM) > SizeOf(Data Files) + SizeOf(Index Files).
That way, your files will always be in memory and you will avoid swapping.

What If My Data is Too Big?
In this case you should choose one of the following strategies:

Shard your MongoDB between several servers.
Tune your queries (this is a good idea in general).
Design your databases and collections to support "Data Hierarchy".
Choose SSD as your hard disk solution.
Consider vSMP Foundation for Memory Expansion from ScaleMP to increase RAM up to 256TB as our readers suggested (P.S don't forget them to ask for the special price :-)

Bottom Line
Careful design will do miracles to your production environment.

Keep Performing,
Moshe Kaplan

May 20, 2013

mongoDB Performance Tuning (Full Presentation from Israel MongoDB User Group)

I gave a lecture today at the Israel MongoDB User Group regarding MongoDB performance tuning and how scale with it.
It was a great event that was organized by Wix, 10gen and Trainologic.

Did you miss the event? Well, don't miss the lesson. You can find below the full presentation:

mongoDB Performance from Moshe Kaplan

Bottom Line
Boost your MongoDB...

Keep Performing,
Moshe Kaplan

Apr 15, 2013

MongoDB, Users and Permissions

NoSQL and Enterprise Security?
That is not the first thing that comes to mind when you consider using NoSQL. It is not a big surprise as the early adapters of NoSQL were Internet companies.
An evident for that you can find in MongoDB, where authentication is dimmed by default.

How to Enable MongoDB Authentication?

Create an Admin user (otherwise you will have issues to connect your server) from the local console:

use admin;
db.addUser({ user: "", pwd: "", roles: [ "userAdminAnyDatabase" ]})

Enable authentication in the /etc/mongo.conf: auth=true
Restart the mongod instance to enable authentication.

How to Add Additional users?

Select the database that you want to add user to:

use

db.addUser( { user: "", pwd: "", roles: [ "", ""]})

And select the a user role from the following permissions list:

How to Provide Permissions to Other Databases?

This one is done with a "copy" like method, where userSource defines the database that the user definition should be copied from:

use

db.addUser( { user: "", userSource: "", roles: [ "" ] } )

In case you want to provide read permissions to all databases you may use the readAnyDatabases

Bottom Line

Not very complex, but more secure.

Keep Performing,

Moshe Kaplan

Pages

Dec 23, 2013

Dec 14, 2013

Dec 9, 2013

Dec 3, 2013

Nov 9, 2013

Nov 5, 2013

Oct 13, 2013

Sep 15, 2013

Sep 12, 2013

Aug 28, 2013

Aug 16, 2013

Jul 16, 2013

Jun 28, 2013

Jun 25, 2013

Jun 7, 2013

May 26, 2013

May 20, 2013

Apr 15, 2013

ShareThis

Intense Debate Comments

Ratings and Recommendations