Nov 16, 2010

Cloud Computing Design and Best Practices

Today I presented "Cloud Computing Design and Best Practices" in CloudCon, one of the largest cloud events in Israel.

It was a great lecture and I would like to share with you some of the lecture's insights:
  1. Assumptions: Don't refer cloud computing as a brave new world. It is, but when you design your cloud system, don’t forget to keep care of the basic rules you had before.
  2. Create A Road Map: You probably will be able to turn your system into a 1 Billion users system in day one. Therefore, design your road map and understand how you will reach your target. In order to get there, you should define the various system parts and you should understand how each of them will be scaled out/removed/replaced in the future to meet your road map.
  3. Start Fast: People love success. Your investors love it, Your Marketing guys Love it, Your Customers love it and even Your Development guys do. Therefore, start safe and fast. If your development is great in C#, start with it. If you have an existing software product, start with it as well. It is always better to start with a model based on technologies and products that you are good at, then getting into an adventure that you cannot control its risks.
  4. Minimize Costs: After you decided to take the fast track, you should control your costs, and most important: your growth costs. You should go over your business plan and turn it into technical requirements plan and find your bottlenecks. Based on these bottlenecks, define solution for each of them: if you are in the online ads market, take care of your impressions module; if you are in the video business, take care of your video processing module. Why? In the viral and online business the growth is exponential. Therefore, cost growth is exponential, and you have to take care of it before your budget will run away.
  5. Best Strategies:
    1. Scale out: think of sharing nothing, understand how to take each server and split it to infinite number of servers. If a larger server procurement is your best solution, you probably in the wrong direction.
    2. Sharding: Data is usually the largest obstacle for scaling out, as conservative designs concentrate the data in a single place. If you have a similar case, take the path the giants already taken: Shard your database either if it is MySQL or SQL Server (read my Best Seller Sharding Post
    3. In Memory Database: In Memory is X5-X10 faster than on disk. Therefore, analyze your system, and understand what you can do without going to disk. In case you may be able avoid neglected number of transactions, use this technique to cut your costs.
  6. Refactor on the Run: as a player in a growing business, you don't have the option to rest. A 100 users system is different than a 100M users system, and as the system grows, smaller modules that were neglected in the first phase will become more important in terms of bottleneck, cost or business sensitivity. Your way to handle it should be refactor the system, step after step to meet the business goals.
  7. Define the Exit Strategy: You should always remember that your cloud operator is still a vendor and your best partner in the early days may become an obstacle when you become a giant. Therefore, choose carefully your cloud provider tools. I would recommend you to think twice before you choose propriety data stores like SimpleDB, and if you do: Create your own interface and have an exit strategy when needed.
  8. Everybody is using Open Source. What if your organization is not an expert in this field? As written before, you should start fast. There are plenty of cloud providers that support Windows and .Net, and you can get a head start if you will use the technology you are familiar with. When you grow, you may refactor your product and add technologies to remove your bottlenecks such as Erlang for Push or LAMP to handle your most common processes that result in 90% of your costs.
  9. Looking for more strategies?
    1. CDN: extract your static and streaming content to CDN provider. This move will cut your server and network utilization and will improve your end user experience.
    2. Smart Clients: turn your end user client to be sensitive to network failures. If you are using Gmail and seen the loading... label instead of 404 page, you probably understand what I'm talking about (otherwise Google for JQuery).
    3. Elastic Growth: if your system pattern usage is not uniform, consider turning on and off some of your instances to keep costs down and meet the spikes.
    4. Replication: don't forget to keep your data safe and meet failures. Make sure you do it using commodity hardware and software.
    5. Prepare for downtime & upgrades: make sure that you can always go on. Downtimes will come, since in large scale everything happens, even what should have not. Make sure you never really shut down you whole system even when you upgrade it.
    6. NoSQL and SQL: choose SQL as a start if you are great at it. But don't neglect NoSQL when you will get larger.
  10. Risk Management: You are going to do a bold move ahead, and you should prepare yourself as I mentioned before:
    1. Choose your vendors carefully and know your exit strategy. Your Cloud Operator is a provider as well.
    2. Hedge costs and take care of your bottlenecks.
    3. Stress your system all the way to guarantee you can get to the next level.
    4. Think one move ahead and keep aligned to your strategy.
    5. Listen to your users feedback. Your business depends on your clients; take care of them.
Bottom Line
Working according to these rules can help you reach the 1 Bill users system you were looking for

Keep Performing,
Moshe Kaplan
Follow MosheKaplan on Twitter


Intense Debate Comments

Ratings and Recommendations