Issues with SQL Monitor v3.2.0234 Clock Skew alert
JoeGT
Posts: 50
I have been doing a little backwards and forwards with a hosting provider for the two of the SQL environments I support in regards to time synchronisation and clock skew events in SQL Monitor.
Potential issues on the windows side have now been corrected and based on the output below, there is next to no known skew between our base monitor and the SQL instances that it monitors (ie. well under a second). The output below is indicative for the SQL instances (ie all of them report the same result).
Syncing of Domain Controllers
C:\>w32tm /monitor
<domain controller>.xxxxxx.com[xxx.xxx.xxx.xxx:123]:
ICMP: 0ms delay
NTP: -0.0109539s offset from <mastertimeserver.com>
RefID: <mastertimeserver.com> [xxx.xxx.xxx.xxx]
Stratum: 4
<domain controller>.xxxxxx.com[xxx.xxx.xxx.xxx:123]:
ICMP: error IP_REQ_TIMED_OUT - no response in 1000ms
NTP: -0.0159051s offset from <mastertimeserver.com>
RefID: <mastertimeserver.com> [xxx.xxx.xxx.xxx]
Stratum: 4
<mastertimeserver.com> *** PDC ***[10.10.6.11:123]:
ICMP: error IP_REQ_TIMED_OUT - no response in 1000ms
NTP: +0.0000000s offset from <mastertimeserver.com>
RefID: <name of external time server> [xxx.xxx.xxx.xxx]
Stratum: 3
<domain controller>.xxxxxx.com[xxx.xxx.xxx.xxx:123]:
ICMP: 0ms delay
NTP: -0.0272185s offset from <mastertimeserver.com>
RefID: <mastertimeserver.com> [xxx.xxx.xxx.xxx]
Stratum: 4
Syncing of Base Monitor vs SQL Instances
C:\ApplicationManagement>psexec \\<base monitor server> -h w32tm /stripchart /dataonly /samples:2 /period:1 /computer:<sql instance>
Tracking <sql instance> [xxx.xxx.xxx.xxx:123].
The current time is 18/04/2013 7:40:20 AM.
07:40:20, +00.0583422s
07:40:21, +00.0583227s
The issues however is that I am still seeing (for two 5 SQL instance environments) more than 20 clock skew events raised a day. Without variation all of them are raised and clear within approximately a minute.
A sample of the "Alert History" for one is as below :
Raised High 12:49 AM
Ended - 12:50 AM
So rather than just disabling the "Clock Skew" alert for all of these instances (which is of course a possibility), I want to understand how this alert actually does is checks of the clock difference between a base monitor and its monitored instances. Because it would seem that the fault lies with SQL Monitor and not with Windows and its time service/synchronisation.
Let me know if you need further details here.
Cheers
Joe
Potential issues on the windows side have now been corrected and based on the output below, there is next to no known skew between our base monitor and the SQL instances that it monitors (ie. well under a second). The output below is indicative for the SQL instances (ie all of them report the same result).
Syncing of Domain Controllers
C:\>w32tm /monitor
<domain controller>.xxxxxx.com[xxx.xxx.xxx.xxx:123]:
ICMP: 0ms delay
NTP: -0.0109539s offset from <mastertimeserver.com>
RefID: <mastertimeserver.com> [xxx.xxx.xxx.xxx]
Stratum: 4
<domain controller>.xxxxxx.com[xxx.xxx.xxx.xxx:123]:
ICMP: error IP_REQ_TIMED_OUT - no response in 1000ms
NTP: -0.0159051s offset from <mastertimeserver.com>
RefID: <mastertimeserver.com> [xxx.xxx.xxx.xxx]
Stratum: 4
<mastertimeserver.com> *** PDC ***[10.10.6.11:123]:
ICMP: error IP_REQ_TIMED_OUT - no response in 1000ms
NTP: +0.0000000s offset from <mastertimeserver.com>
RefID: <name of external time server> [xxx.xxx.xxx.xxx]
Stratum: 3
<domain controller>.xxxxxx.com[xxx.xxx.xxx.xxx:123]:
ICMP: 0ms delay
NTP: -0.0272185s offset from <mastertimeserver.com>
RefID: <mastertimeserver.com> [xxx.xxx.xxx.xxx]
Stratum: 4
Syncing of Base Monitor vs SQL Instances
C:\ApplicationManagement>psexec \\<base monitor server> -h w32tm /stripchart /dataonly /samples:2 /period:1 /computer:<sql instance>
Tracking <sql instance> [xxx.xxx.xxx.xxx:123].
The current time is 18/04/2013 7:40:20 AM.
07:40:20, +00.0583422s
07:40:21, +00.0583227s
The issues however is that I am still seeing (for two 5 SQL instance environments) more than 20 clock skew events raised a day. Without variation all of them are raised and clear within approximately a minute.
A sample of the "Alert History" for one is as below :
Raised High 12:49 AM
Ended - 12:50 AM
So rather than just disabling the "Clock Skew" alert for all of these instances (which is of course a possibility), I want to understand how this alert actually does is checks of the clock difference between a base monitor and its monitored instances. Because it would seem that the fault lies with SQL Monitor and not with Windows and its time service/synchronisation.
Let me know if you need further details here.
Cheers
Joe
Comments
This is being investigated as a suppot case via our ticketing system as I need some files from you. The case number for your reference is F0072015.
At this stage are you able to send me the base monitor log files for this server?
Cheers
Joe
One possibility that could be causing this is because the query that performs the clock skew check runs every 15 seconds. So if the monitored server is busy performing other tasks or a network problem has occurred, it is possible for the clock skew query being late in reply.
Is it possible to confirm there are no messages in the windows event logs corresponding to any of this timelines?