Feb 25, 2009

Updated Knol: Microsoft Velocity


I just updated my knol regarding Microsoft Velocity, Microsoft IMDB, cache and "Grid" product.
Have fun reading and commenting,

Moshe Kaplan. RockeTier. The Performance Experts.

Feb 23, 2009

SQL Server Partitioning: The bad, the good and the evil

Horizontal Sharding is used to separate rows between several tables based on applicative logic. Microsoft introduced in MS SQL Server 2005 a build in mechanism named Partitioning to support this need without extra code in the business logic itself. This mechanism enables you deciding in which filegroup each row will be placed supporting regular queries to retrieve and update the data, while boosting performance.
Some syntax and code examples are available if you want to master this feature.

What Can be done with this great feature?
  1. You can break large tables into smaller chunks which fits the application logic (e.g partioning client's data according to its id or data according to date).
  2. You can put heavily accessed parts of the table onto fast storage, and less-accessed data onto slower, cheaper storage.
  3. You can boost the backup time in static date partitioning.
  4. You can boost many queries including DELETE, SELECT, UPDATE and so on based on right design.

Pros (Why should I use SQL Server Partitioning rather than Horizontal Sharding)
  1. Horizontal Sharding out of the box.
  2. A lot of thought and effort were invested in this feature to make it working in just few lines of code
  1. Relatively new (well not so new since it was presented in SQL Server 2005), and many issues were fixed in SQL Server 2008.
  2. Requires Enterprise Edition (10X licensing cost relative to standard edition or in other words: $25K per CPU)
  3. Relatively complex (no support in enterprise manager, and few DBAs will be able to support it) so you probably should master this white paper before taking it into production
  4. Will bound you to SQL Server (Enterprise Edition).
Industry Opinions:
Brent Ozar: "outside of data warehouses, I like to think of partitioning as the nuclear bomb option. When things are going out of control way faster than you can handle with any other strategy, then partitioning works really well. However, it’s expensive to implement (Enterprise Edition plus a SAN) and you don’t want to see it in the hands of people you don’t trust."

Bottom line:
Ask your DBA if they feel safe with this feature, and with your clients DBA. If the answer is no, consider choosing another solution.

I hope now you have all the information to make the decision by yourself. Otherwise, post your comments and we'll be glad to help you,

Keep Performing,
Moshe Kaplan. RockeTier. The Performance Experts.

UPDATE 2: IGT Hosting Amazon AWS Hands-on workshop will start at 18:00


Great News,
After all the Amazon meetups, the IGT is going to host Amazon AWS Hands-on workshop,
It's a great opportunity to meet Simone Brunozzi, Amazon Web Services Evangelist - Europe, and have real life hands on experience as well as asking questions.

Date: Mar 3, 2009 10:00 13:00
Location: SUN Offices, Manofim 9, 8th Floor, Hertzelia, Israel
Organizer: IGT

The preliminary agenda:
18:00- 18:30 Reception
18:30 - 19:00 Intro to Amazon AWS
19:00 - 20:30 Hands-on AWS Workshop
Account management
S3 – details and examples, using Firefox S3 organizer
EC2 – details and examples, Linux and Windows, using the AWS Console
Cloudfront – details and examples, using Firefox S3 organizer and/or other tools
20:30 - 21:00 AWS Q&A

Moshe Kaplan. RockeTier. The Performance Experts.

Feb 21, 2009

Case Study: Handle 1 Billion Events Per Day Using a Memory Grid


We just published our case study regarding affiliate marketing billing system performance boosting, and we got a great post from Todd Hoff from HighScalability.com regarding it.

The case study main highlights are:
1. How to grow from 1 million events per day system to 1 billion events per day
2. How to keep cost low and avoid millions of USD equity investments
3. How to grow fast keeping close with the business objectives while designing the road map

Please feel free to read our performance boosting case study and comment regarding it in this blog.

Moshe Kaplan
RockeTier. The Performance Experts.

Update #1: Answers to questions I received through the email:

How do we provide HA?

We usually deploy the systems in a active/active configuration.

What about crash recovery? If counters are kept in memory only there is a window time where a crash will loose the updated counters? Is the client OK with loosing some updates or do you address it someway?
First of all, many of our clients prefer to avoid data loss and risk the lose of data saved in the memory. This is based on simple arithmetic. If you lose a 1 minute of business operation in single server that is about 400 events/second and in case every 1000 events generates you a revenue of few dozens cents, you will prefer losing these 30 bucks than replicating every part of the system.

How do you make sure that the other request to the failed server are still getting answer?
The load balancer is smart enough to detect the server failure, change the rotation algorithm, and making sure an alternative server will take care of the processing.

How do we support Multi datacenter HA?
Multi datacenter HA can be achieved using geo clustering

What about customers that require zero loss of data?
Other customer that require zero lose of data, get an answer using 1) Gigaspaces XAP, which supports data synchronization on the fly between two servers, keeping the two servers synchronized to the last operation and 2) using RDMA.

Feb 19, 2009

Lecture: MySQL Sharding


In the next Israel MySQL User Group, RockeTier will present:

"How Sharding turned MySQL into the Internet de-facto Database Standard?"
A common belief in the enterprise software world is that MySQL cannot scale to large databases sizes. The Internet industry proved it can be done. These days many of the Internet giants, processing billions of events every day, are based on MySQL. Most of these giants were able to turn MySQL into a mighty database machines by implementing Sharding.
What is Sharding? What kinds of Sharding can you implement? What are the best practices? All these issues will be address in this lecture by Moshe Kaplan from RockeTier, a performance expert and scale out architect

When: Wed, March 4th
Where: InterBit, 6 Ha`chilazon St., Ramat Gan, Israel, 03-7529922

Feb 3, 2009

Google Performance Lags Behind


We always tend to think that the mighty great internet giants are free of bottlenecks. Well these guy spend a lot of money and have the best guys out there (except for us of course :-).

However, if you use Google Analytics at your site (I admit, we at RockeTier measure everything as annonymous profilers) or just happen to use a site that uses it, you probably noticed that it takes a pretty long time to connect and download the js files that measure your stay in the site.

Well now it's scientific, blogoscoped found out that the Google Analytics scripts load 27% slower at peak hours in North America, and 97% slower at peak hours in Europe!!!
This behavior is a real headache if you decide your online marketing efforts based on Google Analytics data.

A nice solution for that is keeping a copy of the script on your firm servers, making sure that you are not depended on Google performance,

Moshe. RockeTier. The Performance Experts.

IGT Hosting Amazon AWS Hands-on workshop


Great News,
After all the Amazon meetups, the IGT is going to host Amazon AWS Hands-on workshop,
It's a great opportunity to meet Simone Brunozzi, Amazon Web Services Evangelist - Europe, and have real life hands on experience as well as asking questions.

Date: Mar 3, 2009 10:00 13:00
Location: IGT Office, Maskit 4, 5th Floor, Hertzelia
Organizer: IGT

The preliminary agenda:
10:00 - 10:30 Intro to Amazon AWS
10:30 - 12:00 Hands-on AWS Workshop
12.00 - 13:00 AWS Q&A

Moshe Kaplan. RockeTier. The Performance Experts.

Feb 2, 2009

The Mystery of System Calls


It always a pleasure to have real life contributions from colleagues in the industry. This time, Rubi Dagan, a system architect and senior team leader at Metacafe, one of the world's largest video sites ww, shares with us "the mystery of system calls".

Metacafe's software system had many calls to time(), and during stress it was felt much stronger. For example, in the figure 1 you can see that 35% of the syscall time was wasted on time().

However, when taking a look on several other servers, it was found that all requests are being processes without calls to time at all!!! (see figure 2). Hint: use strace -cp `ps ax | grep [h]ttpd | awk '{ print $1 }' | tr '\n' ',' | sed 's/,/ -p /g'` -f to get this information

Solving the mystery...

The solution of it is based on the BIOS, in an option named HPET – High Precision Event Timer which when enabled, the kernel do fast lookups without a need to use the time() syscall. This method is able to track the time instead of the kernel. Please notice that this function should be enabled on the kernel.

That’s it, instead of Apache or other program deals with time the HPET mechanism do that. The bottom line is reduce time system calls. See also the thread in StackOverflow.

Bottom Line
This new configuraion reduced syscalls time by ~30% and more which is being translated to a great performance impact on Metacafe servers.

P.S We'll be glad to expose here other cases from the industry. Don't be shame to submit your case and contribute the community.

Best Regards,
Moshe Kaplan. RockeTier. The Performance Experts


Intense Debate Comments

Ratings and Recommendations