We have the script in place and it is working as expected. Alert comes in > PowerShell script changes the resolution state > Notification goes out. Now during our testing we only were working on one alert at a time. Flipping the state back and forth between New and High and everything worked just fine. We expanded testing to the live environment and let it cook for a while. After about a week we noticed that not all of the Resolution states were changing from New to High automatically. After further investigation we noticed that the alerts that were not changing had all come in within about a second or so of each other. So for instance a server kicks off fifteen alerts within one second of each other only some of them would change to the new state, or there was a network issue and we lost connection to twenty servers we were only notified on a few.
After some investigation we found the following error in Management Group Health, "The process could not be created because the maximum number of asynchronous responses (5) has been reached, and it will be dropped":
As it turns out SCOM cannot execute more than 5 command notifications asynchronously by default. It is setup this way to protect the RMS box from being overwhelmed in the event of a flood of alerts. Well this can be a bit counter productive if you are relying on the command function to execute as part of notifications. If there were a real disaster you may not actually realize it if you only see one out of potentially hundreds of notifications.
Fortunately this setting can be modified. There is a registry key you can put in place to override this setting. I urge caution, however. By changing this setting you can overload your management server if the script does fire a lot or if it takes extensive time to process especially if it is on an slower machine. I would increase it slowly over a bit of time to make sure you don't get undesired results.
On the RMS box open up RegEdit. Navigate to HKEY_LOCAL_MACHINE\Software\Microsoft\Microsoft Operations Manager\3.0\Modules.
- Create a new subkey called Global (if it does not already exist)
- In Global create another new subkey called Command Executer
- In Command Executer create a new DWORD called AsyncProcessLimit
- You can set the decimal value between 1 and 100. Again I would start small, say 20 and move up from there.
- Restart the Health Service to allow the new settings to take effect.
So to make this a bit easier I wrote a PowerShell script to create this key for you. This will create the key and set it to 20.
$regKey = [Microsoft.Win32.RegistryKey]::OpenRemoteBaseKey(([Microsoft.Win32.RegistryHive]"LocalMachine"),'.');
[string]$KeyName="SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Modules\Global\Command Executer";
[void]$regKey.CreateSubKey($keyName);
$subKey = $regKey.OpenSubKey($keyName,$true);
$subKey.SetValue("AsyncProcessLimit", 20,[Microsoft.Win32.RegistryValueKind]::DWord);
Stop-Service HealthService
Start-Service HealthService
More to come!
If you like this blog give it a g+1
Contributing Documentation: Clive Eastwood
