911 Azure Serial Console, what’s your emergency?
I recently had a customer who could not connect to his Azure Linux VM anymore. The VM all of a sudden stopped responding to SSH and rebooting the VM also didn’t resolve the problem.
How was this possible? And more important, how do we solve it?
I first tried restarting the VM as a quick bandage.
This should be sufficient for most cases but in this case, it wasn’t enough.
Also redeployment is a way of quickly solving connectivity issues, but this also wasn’t the solution.
Warning! Redeployment means downtime because the VM will be moved to a different host in Azure!
Okay so normal emergency procedures where followed, what do we do next?
We try to connect to the VM via the Serial Console, the cabled cableless solution…
To do this we first have to enable Boot diagnostics because this starts the serial console process right from the boot order.
Next up was to restart the Azure VM so the boot diagnostics would start the serial console module.
With the serial console activated you are able to see what is happening on the backend of your VM.
This is ideal for problem solving, in my case I got an error message:
Error in serial console : Handler Too many files under: /var/lib/waagent/events, removing oldest
waagent is not running.
A quick Google scavenger hunt lead to these articles:
The WAAgent is necessary for Azure to talk to the VM OS and to allow it to process requests like for instance password resets.
I also noticed that their was another disk I/O error in the console output so maybe disk space was low which prevented the WAAgent to start…
Because the OS disk was not managed I only had a vhd file on blob storage so not much to do there but to resize it and hope it was the solution…
And yes it was! After rebooting the VM the WAAgent started working again and I was able to connect to SSH.
Also password reset from the Azure Portal started working again which was neccesary because the password for the admin user was unknown….
After overcoming all these issues the customer was able to connect to the VM via SSH and could happily continue with his work.
P.S. BTW, I tried to contact this guy, but he couldn’t fix it…