The emergence of cloud computing and Everything as a Service in last few years, raised many issues in enterprises that started to outsource part of their services and infrastructure into the cloud.
If in the pre-migration IT managers knew the availability of every piece of their infrastructure, now things are a little bit more complex.
But are Things Really Different?
If you had a COTS ERP or CRM system that you bought from SAP, Oracle or Microsoft, did you really know that happening in the system internals? Or did you focus in measuring the end user experience as well as system metrics like CPU and disk utilization?
What Should We Measure?
In order to decide what we should measure, we should ask first several questions:
- What will cause me troubles if I'll not measure?
- Where the money is coming out of?
- What are you paying for?
- What is the service interface?
All these questions have specific answers in a common agreement between the parties: the SLA (Service Level Agreement). If it is important enough, you should provide metrics for that, and if you have metrics for that you probably can monitor that using your monitoring system. The importance of SLA monitoring in the Cloud world is highly resembled by Oblicore acquisition by CA in Jan 2010,
What Should We Not Measure?
Usually I try to avoid specific devices monitoring of in the cloud provider since it will not provide me any information and will break the interface of the service (sounds like object oriented design).
However, if you pay for things that are virtual like redundancy or extra capacity, the service provider should provide measurements for that.
Can You Give Us a Small Example?
Usually, I tend to monitor the user experience. For example, if I use a SMS gateway to send SMS through my system, I'll monitor:
- The submission time of an SMS request
- The time to get the SMS in a handset that is stimulated using a cellular modem.
I will perform this test once in a while. E.g. if I send dozens of SMS every second, I will perform measurement every few seconds.
The Bottom Line
Choose your metrics carefully according to the defined SLA and monitor them using your monitoring system. Do not be tempted to over measure in order to avoid breaking the interface,
Keep Performing,