Tuesday, September 19, 2017

SCOM 2016 - Monitor & Recover a Service

Monitoring services is the bread and butter of SCOM but not a lot of people know how to actually setup service monitoring, and fewer still know how to setup SCOM to automatically restart services when they fail.

Creating the Monitor:

In the SCOM console go to Authoring > Management Pack Objects and Right Click on Monitors. Select Create a Monitor > Unit Monitor

The Create a Unit Monitor wizard will run, Select Windows Services > Basic Service Monitor. Also Select a Management Pack. Click Next

In General give the monitor a name, I like to use the friendly name of the service so I can easily go back to it. Monitor target should be Windows Computer and Parent monitor is Availability. If this is a service that is common to a lot of servers then leave Monitor is enabled checked. Otherwise uncheck it and you will have to do an override later. Click Next

If you know the service name (The actual name, not the common name in services) you can enter it here otherwise you can just browse for it but hitting the ... button. Click Next
NOTE: Make a note of the service name as you will need it later

In Configure Health set your state conditions and Click Next

Finally in Configure Alerts Check the box Generate alerts for this monitor. Update the alert description as needed and Click Create.

Setting Auto Recovery:
Now that the monitor is created go back into monitors and find it. Right click on the monitor and select Properties. Go to the Diagnostic and Recovery tab. Under Configure Recovery Tasks select Add > Recovery for critical health state

Select Run Command and Click Next

In General call it Start Service. For Recovery target select Windows Server and make sure Run recovery automatically and Recalculate monitor state after recovery finishes are both checked and Click Next

The Full path to file is %windir%\system32\net.exe and the Paramaters are Start ServiceName (captured earlier) In my case it is start MB3Service. Set the timeout to a few minutes and Click Create. 

Click Apply

Give it a few minutes to propagate out into your environment and once it does you can stop the service on a test machine to make sure the alert goes out and the service gets restarted. 


More to come!

If you like this blog give it a g+1