New Data Collection Monitoring Errors
cehottle
Posts: 38
I installed release 2.2 this morning. Since I installed the update, I've received 67 host machine monitoring or SQL Server monitoring error alerts over a 6 hour period for one of the servers. When I look at the Manage Monitored Servers page, everything looks fine. These apparently can be pretty transient and it would probably be good to have some sort of threshold before an alert is triggered, though I'm not sure exactly how that would work. It's still nice to have them and I'm wondering if you have any ideas about how to determine why I'm seeing so many from one server. It's in a remote location, but there are two other servers there as well and I haven't seen any from them. Thanks.
Comments
It is possible to see the error that causes monitoring to stop by clicking the Show Log link for the relevant server on the Manager Servers config page. However, this only displays the last five or so minutes worth of logging so wouldn't help here.
In most cases the base deployment log files for the time period in question would be the best place to look. These are located at "C:\ProgramData\Red Gate\Logs\SQL Monitor 2" or "C:\Documents and Settings\All Users\Application Data\Red Gate\Logs\SQL Monitor 2" depending on your operating system. If you send them to chris.spencer@red-gate.com I would gladly look through them and see if anything unusual is getting logged.
Regards
Chris
Test Engineer
Red Gate
Red Gate wants to blame WMI but there are no WMI issues. We use OpManage to monitor our host servers through WMI and there are no issues at all with it.
Last week I found three jobs that failed on the servers and there were no indication of failed jobs in the SQL Monitor. I get a string of "data collection errors" about twice a day, no specific times, it's random and I get it across the various sites we have to multiple servers, but not all of them.
Anyone else having this issue as well?
We only use WMI to collect the following information.
• Cluster configuration and status
• Total amount of physical memory
• OS version and service pack
• Window process user
• Host name and DNS name of the machine
We mostly use perfmon and the recent issues appear to be related to this. It is possible to see what the monitoring error is by going to the Monitored Servers page and clicking the Show Log link for the relevant server. This will only show the last 5 minutes of logging however.
Regards
Chris
Test Engineer
Red Gate
So what would cause me to not get a warning about a failed job?
I'm not 100% sure. The job failed alert is relatively uncomplicated and triggers on seeing a job failure in the job history. We do collect this data and any failed jobs should be present in the SQL Monitor data repository.
This SQL should show any failures:
It would be worth checking if a row exists at the specific point of time that your job failed. I could probably cobble together some more complicated SQL that displays the job name etc if that helps?
Regards
Chris
Test Engineer
Red Gate
Anyway, I'll check on the entry. I do have a lot of entries in the log about fialed triggers on all the servers. That's one thing I have been wondering about and asking Chris about.
Ran the query and i have entries in the tables for the failures. Nothing showing in the website GUI though. Very strange.
Any ideas?
In the meantime I've created some SQL to check the alerts tables:
The SeverityDate column should be the time that the alert is raised as there is usually only one possible severity for Job Failed alerts. It would be interesting to know if there are any records for the minutes after a job failure was reported on the server.
Regards
Chris
Test Engineer
Red Gate