We have the script in place and it is working as expected. Alert comes in > PowerShell script changes the resolution state > Notification goes out. Now during our testing we only were working on one alert at a time. Flipping the state back and forth between New and High and everything worked just fine. We expanded testing to the live environment and let it cook for a while. After about a week we noticed that not all of the Resolution states were changing from New to High automatically. After further investigation we noticed that the alerts that were not changing had all come in within about a second or so of each other. So for instance a server kicks off fifteen alerts within one second of each other only some of them would change to the new state, or there was a network issue and we lost connection to twenty servers we were only notified on a few.
After some investigation we found the following error in Management Group Health, "The process could not be created because the maximum number of asynchronous responses (5) has been reached, and it will be dropped":
As it turns out SCOM cannot execute more than 5 command notifications asynchronously by default. It is setup this way to protect the RMS box from being overwhelmed in the event of a flood of alerts. Well this can be a bit counter productive if you are relying on the command function to execute as part of notifications. If there were a real disaster you may not actually realize it if you only see one out of potentially hundreds of notifications.
Fortunately this setting can be modified. There is a registry key you can put in place to override this setting. I urge caution, however. By changing this setting you can overload your management server if the script does fire a lot or if it takes extensive time to process especially if it is on an slower machine. I would increase it slowly over a bit of time to make sure you don't get undesired results.
On the RMS box open up RegEdit. Navigate to HKEY_LOCAL_MACHINE\Software\Microsoft\Microsoft Operations Manager\3.0\Modules.
- Create a new subkey called Global (if it does not already exist)
- In Global create another new subkey called Command Executer
- In Command Executer create a new DWORD called AsyncProcessLimit
- You can set the decimal value between 1 and 100. Again I would start small, say 20 and move up from there.
- Restart the Health Service to allow the new settings to take effect.
So to make this a bit easier I wrote a PowerShell script to create this key for you. This will create the key and set it to 20.
$regKey = [Microsoft.Win32.RegistryKey]::OpenRemoteBaseKey(([Microsoft.Win32.RegistryHive]"LocalMachine"),'.'); [string]$KeyName="SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Modules\Global\Command Executer"; [void]$regKey.CreateSubKey($keyName); $subKey = $regKey.OpenSubKey($keyName,$true); $subKey.SetValue("AsyncProcessLimit", 20,[Microsoft.Win32.RegistryValueKind]::DWord); Stop-Service HealthService Start-Service HealthService
More to come!
If you like this blog give it a g+1
Contributing Documentation: Clive Eastwood
Is this setting recommended for a larger SCOM environment like 3 management servers & 2 gateways with 1700 Agents with multiple technologies being monitored like Exchange 2007, 2013, Share point 2010, Lync 2010 & SQL 2005, 2008 & 2012 & Windows & IIS monitoring ?
ReplyDeleteYes, you would be fine in an environment of that size. As long as you start out small and move up incrementally. You will keep getting the error if you have not increased it enough. And remember you need to do it on all of your management servers as the notifications can come from any of them.
DeleteHi Jim,
ReplyDeleteThank you for the suggestions. I started with 30 and its been 4 months and looks stable without any issues...
That's great news. Glad it worked out for you!
Deletewhat is default value given by Microsoft
ReplyDelete5 is the default value
Delete