Several questions

HugoKornelis · January 30, 2019 12:57PM

Hi all!

I have a few questions on SQL Monitor. They are unrelated; I hope combining them in a single post is not too much of a burden? Let me know if you prefer me to take down this post and re-post as four separate topics.

1. One of the instances I monitor is a SQL 2005 instance running on Windows Server 2003. I know this is not supported and I know that as a result I do not get all the information. But I thought, until recently, that the information I do get is reliable.

On the Analysis tab, I see extremely high numbers for disk read time and disk write time. The scale is in seconds instead of milliseconds. I asked the systems admin to check and they say they see nothing strange. So I opened a perfmon instance on that server, had it watch "logical disk" / "avg. disk sec/read", then ran some queries to force IO, and then compared the perfmon and the SQLMonitor displays if disk activity. Turns out that the sysadmin is right, the disk is responding very fast.

Can it be that SQLMonitor is a factor 1000 off when reading this counter from Windows Server 2003?

2. Another instance I monitor is the instance where SQLMonitor itself is running. On this instance, the conection to the SQL Server instance works fine, but the connection to the host fails. The message in the log says that the Outcome is "Internal SQL Monitor error". Both the Exception and the ExceptionMessage read "General / Processor : UnknownError.ExceptionType".

3. Because of this error (which I originally had not noticed - I don't remember it being there when I added the instance), a lot of server-level counters are not checked at all. As a result I was totally unaware that one of the drives on this instance was filling up. Wouldn't it make sense to raise alerts for counters (at least SOME counters) if no values are seen at all? No information on used disk space does not mean that we have enough room!!

4. Yet another (related) issues is that clearing alerts should have more options. Now when I clear an event, I will only see it again if it first goes away and then re-appears. So for instance if I have a Database File Usage alert because file usage is above 85%, then if I clear it it will not reappear as long as the usage remains above 85%, but will reappear if it drops below 85% and then goes up again. This is useful in a lot of cases, but not always. Examples:

* I don't want to see this event ever again. (Can't imagine a good use case for Database File Usage, but not uncommon for e.g. an Agent jobs that is known to intermittently fail). I know I can disable the alert from the configuration menu but that is not very easy to access, and it lacks visibility to quickly see which settings are active. (Unless I am missing something?)

* Remove the alert of the dashboard for now but make it reappear if not fixed after XX amount of time. (E.g. I get Database File Usage so I write some code to purge old data and schedule it to run in the weekend. I don't want to see the red alert on my dashboard the rest of the week, but I would like to get an alert if database file usage is still above 85% after the weekend)

Can't you make it so that when I click Clear on an alert, a dropdown appears that asks whether I want to clear (a) until the event ends and then starts again; (b) for XX period of time; or (c) for always?

Sorry for the rant. I think I'm done now.

Alex B · January 31, 2019 8:04PM

Hi Hugo,

Not too much of a burden

, but it would be nice to split them out in the future if possible!

For your questions:

I thought that it just plain stopped working for 2003 servers from about 7.1.20 or so (it had worked up to around that point because we had not needed to make a breaking change since dropping support in v5). As for being off by a factor of 1000 - it looks like that counter is always in seconds (so .004 is 4 ms) so I would not have thought it would be off. Looking at SQL Monitor UI and PerfMon, Perfmon y axis is set for ms (the scale says 1000 in the table row below the graph) and in SQL Monitor 9.0.2 the y axis is also ms.
That error looks like a WMI error connecting to the host machine - it may be that the counters are corrupt and need to be rebuilt or the mof files need to be recompiled. I think the following from an elevated command prompt at the C:\Windows\System32\Wbem directory:
```
mofcomp.exe CimWin32.mof
```
might help fix it, but it's something environmental gone wrong with WMI
I would have thought a Monitoring error (host machine data collection) would have been raised or a similar error indicating an issue with the connection to the entity - did one of these not get raised?
For this last - it is good feedback and we've had similar before regarding wanting to be alerted again if it's still an issue after a period of time. I'll pass it on to the team again!

Kind regards
Alex

Several questions

Answers

Product Learning

Community Forums

Events & Friends

Simple Talk