"Our production SQL Server VM at Azure is down, and we cannot provide service to clients".
A short analysis issued that we are in a deep trouble:
- The server status was: Stopped (Failed to start).
- Repeated tries to start the server resulted in Starting... and back to the fail message.
- Changing instance configuration as proposed in various forums and blogs was resulted in the same fail message.
- Microsoft claims that everything is Okay with its data centers.
- Checking the Azure storage container found out that the specific VHD disk was not updated since the server failure.
Since getting back to cold backup would cause losing too much data, we had to restore somehow the failed server.
The chances were against us. Yet, lucky us, we could do that by restoring the database files from the VHD (VM disk) file that was available at Azure storage.
How to Recover from the Stopped (Failed to start) VM Machine?
- Start a new instance in the same availability set (that way you can continue using the same DNS name, instead of also deploying a new version of the app servers).
- Attach a new large disk to the instance (the failed server disk was 127GB, make sure the allocated disk is larger).
- Start the new machine.
- Format the disk as a new drive.
- Get to your Azure account and download the VHD file from Azure storage. Make sure you download it to the right disk. We found out that the download process takes several hours even when the blob storage is in the same data center as the VM.
- Mount the VHD file you downloaded as a new disk.
- Extract the database and log files from the new disk and attach them to the new SQL Server instance.
- Keep your backup files updated and in a safe place.
- Keep your database data and log files out of the system disk, so you could easily attach them to other servers.
When the going gets tough, the tough get going