Jun 15, 2014

Auto Scaling your Workers based on Queue Length

Having an image or video processing engine?
Having a web crawling solution?
Doing OCR on media?
If so, this post is probably for you.

The Producer-Consumer Auto Scaling Problem
Back at school, you probably learnt how to manage the queue of waiting tasks in a producer-consumer problem. You may learnt how to avoid expired tasks and double processing of same tasks.
However, back then we all assumed the number of workers and producers is fixed.

Well, it is not true any more...
If you system includes a backend processor that its load may vary based on clients demand, you may need to auto scale it. Back then, you needed to purchase new servers. These days it just about launching new instances in your favorite cloud.

Adjusting AWS Auto Scaling to Support this Pattern
AWS Auto Scaling solution is known as the best of its kind in the industry. It supports:
  1. Dynamic scaling based on instances load (e.g CPU).
  2. Automatically launching and terminating instances.
  3. Supporting both on demand (expensive) instances and Spot (marginal cost) instances.
However, it does not support complex decisions such as queue length/instances length ratio out of the box.

Fortunately, AWS provides a complete set of API that let us define a solution that its general flow is described here and here:
  1. Create a base AMI to a worker/consumer machine
    1. If your code tends to change you may define a method that will update to the latest code when machine is being launched
    2. A better solution may be using a DevOps platform such as Puppet or Chef.
  2. Define auto scaling initial configuration with an initial size of minimum and maximum number of instances (N = 1).
  3. Define a spot instance price: make it reasonable (not too high to avoid extra cost, and not too low to actually launch instances when needed).
  4. Create a cron that will run every 1 min and will check the queue length. This cron will terminate or launch instances by adjusting the auto scaling configuration minimum and maximum number of instances (N).
The Algorithm
You need to take several decisions in order to decide how many instances you want on air:
  1. What is the ratio of tasks in queue (Q) to running instances (for example R = 2)
  2. How quick do you want to reach this ratio. The quicker you do, the higher it will cost (instances that are billed hourly, may be terminated and relaunched again only few minutes after that).
Two examples for a possible algorithms:
  1. Keeping the ratio of queue length and running instances constant: for example: N = max[1, Q/R]
  2. Soften the ratio by using previous calculated number N': N = (N' + max[1, Q/R])/2
Bottom Line
Auto Scaling is an amazing tool, and by adjusting it we can easily solve complex issues.

Keep Performing,


Intense Debate Comments

Ratings and Recommendations