Issues with SQL Monitor appearing to self block itself post
JoeGT
Posts: 50
I have just performed an upgrade in one of my production environments from v3.4.1.4 to v4.1.0.2226 and I am seeing this strange behaviour.
It would appear that some of the processes that SQL Monitor is launching are blocking other SQL Monitor processes. These do clear themselves after 20 seconds but they are filling my my alert window with what I have now deemed "false" alerts (I dont want to switch off the "Blocked Process" alert as I need it for other reasons)
The output from one of these events is below. Sensitive data has been obfuscated :
Blocked process: SQL Monitor - Monitoring
Process ID: 57
Database: tempdb
Host: xxxxx
User: xxxxx
Start time: 16 Jun 2014 2:32 AM
SQL fragment: USE [tempdb] ;
SET NOCOUNT ON ;
IF OBJECT_ID(N'tempdb..[##redgate_sqlmonitor_querywaitstats_xxxxxxx]') IS NOT NULL
BEGIN
SELECT database_name ,
sql_handle ,
statement_start_offset ,
statement_end_offset ,
wait_type ,
SUM(wait_time_delta) wait_time,
COUNT(wait_time_delta) wait_time_count
FROM [##redgate_sqlmonitor_querywaitstats_xxxxxxxx]
GROUP BY database_name ,
sql_handle ,
statement_start_offset ,
statement_end_offset ,
wait_type
TRUNCATE TABLE [##redgate_sqlmonitor_querywaitstats_xxxxxx]
UPDATE [##redgate_sqlmonitor_querywaitstats_info_.xxxxxxx] SET last_sampled = GETUTCDATE()
END
Blocking process: Unknown
Process ID: 77
Database:
Host:
User: Unknown
Start time: 16 Jun 2014 2:11 AM
SQL fragment: --RedGateIgnore
USE [tempdb] ;
SET NOCOUNT ON ;
IF OBJECT_ID(N'tempdb..[##redgate_sqlmonitor_querywaitstats_xxxx]') IS NOT NULL
BEGIN
DROP TABLE [##redgate_sqlmonitor_querywaitstats_xxxxxx]
END
CREATE TABLE [##redgate_sqlmonitor_querywaitstats_xxxxx]
(
database_name VARCHAR(50) NOT NULL,
sql_handle VARBINARY(64) NOT NULL,
statement_start_offset BIGINT NOT NULL,
statement_end_offset BIGINT NOT NULL,
wait_type NVARCHAR(60) NOT NULL,
wait_time_delta INT NOT NULL
)
CREATE INDEX [##redgate_sqlmonitor_querywaitstats_grouping_xxxxxxxxxxxxx] ON [##redgate_sqlmonitor_querywaitstats_xxxxxxxxxxxxx] (database_name, sql_handle, statement_start_offset, statement_end_offset, wait_type)
DECLARE @timestamp DATETIME
SET @timestamp = GETUTCDATE()
IF OBJECT_ID(N'tempdb..[##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx]') IS NOT NULL
BEGIN
DROP TABLE [##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx]
END
CREATE TABLE [##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx]
(
last_sampled DATETIME
)
INSERT INTO [##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx] (last_sampled) VALUES (NULL)
DECLARE @start DATETIME
SET @start = GETUTCDATE()
DECLARE @last_sampled DATETIME
SET @last_sampled = NULL
WHILE ( (@last_sampled IS NULL) AND (DATEDIFF(s, @start, @timestamp) < 300) )
BEGIN
DECLARE @delta INT
SET @delta = DATEDIFF(ms, @timestamp, GETUTCDATE())
INSERT INTO [##redgate_sqlmonitor_querywaitstats_xxxxxxxxxxxxx]
SELECT database_name ,
sql_handle ,
statement_start_offset ,
statement_end_offset ,
wait_type ,
wait_time_delta FROM (
SELECT
DB_NAME(er.database_id) [database_name],
er.sql_handle,
er.statement_start_offset,
er.statement_end_offset,
ISNULL(wt.wait_type, CASE WHEN er.[status] = 'runnable' then er.last_wait_type ELSE NULL END) [wait_type] ,
@delta wait_time_delta
FROM sys.dm_exec_requests er
LEFT JOIN sys.dm_os_waiting_tasks wt
ON er.session_id = wt.session_id
) s
WHERE
s.wait_type IS NOT NULL
AND
s.sql_handle IS NOT NULL
AND s.[wait_type] NOT IN (N'BROKER_EVENTHANDLER',N'BROKER_INIT',N'BROKER_MASTERSTART',N'BROKER_RECEIVE_WAITFOR',
N'BROKER_REGISTERALLENDPOINTS',N'BROKER_SERVICE',N'BROKER_SHUTDOWN',N'BROKER_TASK_STOP',
N'BROKER_TO_FLUSH',N'BROKER_TRANSMITTER',N'CHECKPOINT_QUEUE',N'CHKPT',N'CLR_AUTO_EVENT',
N'CLR_MANUAL_EVENT',N'CLR_SEMAPHORE',N'DIRTY_PAGE_POLL',N'DISPATCHER_QUEUE_SEMAPHORE',
N'FT_IFTS_SCHEDULER_IDLE_WAIT',N'FT_IFTSHC_MUTEX',N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
N'KSOURCE_WAKEUP',N'LAZYWRITER_SLEEP',N'LOGMGR_QUEUE',N'MISCELLANEOUS',N'OGMGR_QUEUE',
N'PARALLEL_BACKUP_QUEUE',N'REQUEST_DISPENSER_PAUSE',N'REQUEST_FOR_DEADLOCK_SEARCH',
N'RESOURCE_QUEUE',N'SLEEP_BPOOL_FLUSH',N'SLEEP_DBSTARTUP',N'SLEEP_DCOMSTARTUP',
N'SLEEP_MSDBSTARTUP',N'SLEEP_SYSTEMTASK',N'SLEEP_TASK',N'SLEEP_TEMPDBSTARTUP',
N'SP_SERVER_DIAGNOSTICS_SLEEP',N'SQLTRACE_BUFFER_FLUSH',N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
N'TRACEWRITE',N'WAITFOR',N'XE_DISPATCHER_JOI',N'XE_DISPATCHER_WAIT',N'XE_TIMER_EVENT')
SET @timestamp = GETUTCDATE()
WAITFOR DELAY '00:00:00.200'
SELECT @last_sampled = last_sampled FROM [##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx]
END
SELECT CAST (CASE WHEN @last_sampled IS NULL THEN 0 ELSE 1 END AS BIT)
This environment is relatively "busy" and in the last 2 hours I have seen over 200 blocking events (all of which look like the above from the ones I have sampled). The issue is only affecting some of the SQL instance in this environment
Is this something that any others have experienced or that RedGate are aware of ?
A little more background - all of the SQL instances being monitored are running at SQL 2008 R2 SP2 CU3.
I have also noticed (running sp_whoisactive) that all the instances being affected have long running SPIDs with the wait type :
DEADLOCK_ENUM_MUTEX
The ones not affected by the issue do NOT have any SPIDs active with this wait type.
This is something we have encountered that is mentioned in this post http://support.microsoft.com/kb/972936/en-us . We are working on implementing the CU that resolves this issue (going through all the normal hoops/signoff/validation etc etc to push to production) but in the meantime I would like to know if this is going to be the root cause of this SQL Monitor and if so what options I have (I realize that by restarting the SQL instance I can remove the waiting SPIDs however there is every chance that other processes we run will regenerate more instances of this type of waiting SPID in the future.)
Look forward to feedback.
Cheers
Joe
It would appear that some of the processes that SQL Monitor is launching are blocking other SQL Monitor processes. These do clear themselves after 20 seconds but they are filling my my alert window with what I have now deemed "false" alerts (I dont want to switch off the "Blocked Process" alert as I need it for other reasons)
The output from one of these events is below. Sensitive data has been obfuscated :
Blocked process: SQL Monitor - Monitoring
Process ID: 57
Database: tempdb
Host: xxxxx
User: xxxxx
Start time: 16 Jun 2014 2:32 AM
SQL fragment: USE [tempdb] ;
SET NOCOUNT ON ;
IF OBJECT_ID(N'tempdb..[##redgate_sqlmonitor_querywaitstats_xxxxxxx]') IS NOT NULL
BEGIN
SELECT database_name ,
sql_handle ,
statement_start_offset ,
statement_end_offset ,
wait_type ,
SUM(wait_time_delta) wait_time,
COUNT(wait_time_delta) wait_time_count
FROM [##redgate_sqlmonitor_querywaitstats_xxxxxxxx]
GROUP BY database_name ,
sql_handle ,
statement_start_offset ,
statement_end_offset ,
wait_type
TRUNCATE TABLE [##redgate_sqlmonitor_querywaitstats_xxxxxx]
UPDATE [##redgate_sqlmonitor_querywaitstats_info_.xxxxxxx] SET last_sampled = GETUTCDATE()
END
Blocking process: Unknown
Process ID: 77
Database:
Host:
User: Unknown
Start time: 16 Jun 2014 2:11 AM
SQL fragment: --RedGateIgnore
USE [tempdb] ;
SET NOCOUNT ON ;
IF OBJECT_ID(N'tempdb..[##redgate_sqlmonitor_querywaitstats_xxxx]') IS NOT NULL
BEGIN
DROP TABLE [##redgate_sqlmonitor_querywaitstats_xxxxxx]
END
CREATE TABLE [##redgate_sqlmonitor_querywaitstats_xxxxx]
(
database_name VARCHAR(50) NOT NULL,
sql_handle VARBINARY(64) NOT NULL,
statement_start_offset BIGINT NOT NULL,
statement_end_offset BIGINT NOT NULL,
wait_type NVARCHAR(60) NOT NULL,
wait_time_delta INT NOT NULL
)
CREATE INDEX [##redgate_sqlmonitor_querywaitstats_grouping_xxxxxxxxxxxxx] ON [##redgate_sqlmonitor_querywaitstats_xxxxxxxxxxxxx] (database_name, sql_handle, statement_start_offset, statement_end_offset, wait_type)
DECLARE @timestamp DATETIME
SET @timestamp = GETUTCDATE()
IF OBJECT_ID(N'tempdb..[##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx]') IS NOT NULL
BEGIN
DROP TABLE [##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx]
END
CREATE TABLE [##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx]
(
last_sampled DATETIME
)
INSERT INTO [##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx] (last_sampled) VALUES (NULL)
DECLARE @start DATETIME
SET @start = GETUTCDATE()
DECLARE @last_sampled DATETIME
SET @last_sampled = NULL
WHILE ( (@last_sampled IS NULL) AND (DATEDIFF(s, @start, @timestamp) < 300) )
BEGIN
DECLARE @delta INT
SET @delta = DATEDIFF(ms, @timestamp, GETUTCDATE())
INSERT INTO [##redgate_sqlmonitor_querywaitstats_xxxxxxxxxxxxx]
SELECT database_name ,
sql_handle ,
statement_start_offset ,
statement_end_offset ,
wait_type ,
wait_time_delta FROM (
SELECT
DB_NAME(er.database_id) [database_name],
er.sql_handle,
er.statement_start_offset,
er.statement_end_offset,
ISNULL(wt.wait_type, CASE WHEN er.[status] = 'runnable' then er.last_wait_type ELSE NULL END) [wait_type] ,
@delta wait_time_delta
FROM sys.dm_exec_requests er
LEFT JOIN sys.dm_os_waiting_tasks wt
ON er.session_id = wt.session_id
) s
WHERE
s.wait_type IS NOT NULL
AND
s.sql_handle IS NOT NULL
AND s.[wait_type] NOT IN (N'BROKER_EVENTHANDLER',N'BROKER_INIT',N'BROKER_MASTERSTART',N'BROKER_RECEIVE_WAITFOR',
N'BROKER_REGISTERALLENDPOINTS',N'BROKER_SERVICE',N'BROKER_SHUTDOWN',N'BROKER_TASK_STOP',
N'BROKER_TO_FLUSH',N'BROKER_TRANSMITTER',N'CHECKPOINT_QUEUE',N'CHKPT',N'CLR_AUTO_EVENT',
N'CLR_MANUAL_EVENT',N'CLR_SEMAPHORE',N'DIRTY_PAGE_POLL',N'DISPATCHER_QUEUE_SEMAPHORE',
N'FT_IFTS_SCHEDULER_IDLE_WAIT',N'FT_IFTSHC_MUTEX',N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
N'KSOURCE_WAKEUP',N'LAZYWRITER_SLEEP',N'LOGMGR_QUEUE',N'MISCELLANEOUS',N'OGMGR_QUEUE',
N'PARALLEL_BACKUP_QUEUE',N'REQUEST_DISPENSER_PAUSE',N'REQUEST_FOR_DEADLOCK_SEARCH',
N'RESOURCE_QUEUE',N'SLEEP_BPOOL_FLUSH',N'SLEEP_DBSTARTUP',N'SLEEP_DCOMSTARTUP',
N'SLEEP_MSDBSTARTUP',N'SLEEP_SYSTEMTASK',N'SLEEP_TASK',N'SLEEP_TEMPDBSTARTUP',
N'SP_SERVER_DIAGNOSTICS_SLEEP',N'SQLTRACE_BUFFER_FLUSH',N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
N'TRACEWRITE',N'WAITFOR',N'XE_DISPATCHER_JOI',N'XE_DISPATCHER_WAIT',N'XE_TIMER_EVENT')
SET @timestamp = GETUTCDATE()
WAITFOR DELAY '00:00:00.200'
SELECT @last_sampled = last_sampled FROM [##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx]
END
SELECT CAST (CASE WHEN @last_sampled IS NULL THEN 0 ELSE 1 END AS BIT)
This environment is relatively "busy" and in the last 2 hours I have seen over 200 blocking events (all of which look like the above from the ones I have sampled). The issue is only affecting some of the SQL instance in this environment
Is this something that any others have experienced or that RedGate are aware of ?
A little more background - all of the SQL instances being monitored are running at SQL 2008 R2 SP2 CU3.
I have also noticed (running sp_whoisactive) that all the instances being affected have long running SPIDs with the wait type :
DEADLOCK_ENUM_MUTEX
The ones not affected by the issue do NOT have any SPIDs active with this wait type.
This is something we have encountered that is mentioned in this post http://support.microsoft.com/kb/972936/en-us . We are working on implementing the CU that resolves this issue (going through all the normal hoops/signoff/validation etc etc to push to production) but in the meantime I would like to know if this is going to be the root cause of this SQL Monitor and if so what options I have (I realize that by restarting the SQL instance I can remove the waiting SPIDs however there is every chance that other processes we run will regenerate more instances of this type of waiting SPID in the future.)
Look forward to feedback.
Cheers
Joe
Comments
Since that point however I can see that SQL Monitor itself as now been the cause of another of these DEADLOCK_ENUM_MUTEX SPIDS
Capturing the query that is running at the time it is this :
<?query --
INSERT INTO [##redgate_sqlmonitor_querywaitstats_xxxxxxx]
SELECT database_name ,
sql_handle ,
statement_start_offset ,
statement_end_offset ,
wait_type ,
wait_time_delta FROM (
SELECT
DB_NAME(er.database_id) [database_name],
er.sql_handle,
er.statement_start_offset,
er.statement_end_offset,
ISNULL(wt.wait_type, CASE WHEN er.[status] = 'runnable' then er.last_wait_type ELSE NULL END) [wait_type] ,
@delta wait_time_delta
FROM sys.dm_exec_requests er
LEFT JOIN sys.dm_os_waiting_tasks wt
ON er.session_id = wt.session_id
) s
WHERE
s.wait_type IS NOT NULL
AND
s.sql_handle IS NOT NULL
AND s.[wait_type] NOT IN (N'BROKER_EVENTHANDLER',N'BROKER_INIT',N'BROKER_MASTERSTART',N'BROKER_RECEIVE_WAITFOR',N'BROKER_REGISTERALLENDPOINTS',
N'BROKER_SERVICE',N'BROKER_SHUTDOWN',N'BROKER_TASK_STOP',N'BROKER_TO_FLUSH',N'BROKER_TRANSMITTER',N'CHECKPOINT_QUEUE',N'CHKPT',
N'CLR_AUTO_EVENT',N'CLR_MANUAL_EVENT',N'CLR_SEMAPHORE',N'DIRTY_PAGE_POLL',N'DISPATCHER_QUEUE_SEMAPHORE',N'FT_IFTS_SCHEDULER_IDLE_WAIT',
N'FT_IFTSHC_MUTEX',N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',N'KSOURCE_WAKEUP',N'LAZYWRITER_SLEEP',N'LOGMGR_QUEUE',N'MISCELLANEOUS',
N'OGMGR_QUEUE',N'PARALLEL_BACKUP_QUEUE',N'REQUEST_DISPENSER_PAUSE',N'REQUEST_FOR_DEADLOCK_SEARCH',N'RESOURCE_QUEUE',
N'SLEEP_BPOOL_FLUSH',N'SLEEP_DBSTARTUP',N'SLEEP_DCOMSTARTUP',N'SLEEP_MSDBSTARTUP',N'SLEEP_SYSTEMTASK',N'SLEEP_TASK',
N'SLEEP_TEMPDBSTARTUP',N'SP_SERVER_DIAGNOSTICS_SLEEP',N'SQLTRACE_BUFFER_FLUSH',N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
N'TRACEWRITE',N'WAITFOR',N'XE_DISPATCHER_JOI',N'XE_DISPATCHER_WAIT',N'XE_TIMER_EVENT')
--?>
And the result is the the SPID running this query is now showing with a wait (and wait time) of :
(184635ms)DEADLOCK_ENUM_MUTEX
And is blocking another SQL Monitor Process :
<?query --
TRUNCATE TABLE [##redgate_sqlmonitor_querywaitstats_xxxxx]
--?>
With wait (and wait time) :
(1609ms)LCK_M_SCH_M
So.. I have kind of answered my own question - I probably need to update to SQL 2008 R2 SP5 to solve the underlying SQL issue. However, as I am not in a position to be able to do this anytime soon, is there any interim solution that RedGate(or someone else enterprising) might be able to offer ? Note: We specifically updated to this version of SQL Monitor to allow us access to the Wait information so not having this feature of the tool functioning correctly is limiting its continued usefulness.
Cheers
Joe
It is possible that the DEADLOCK_ENUM_MUTEX spid referred to above is now causing our SQL instances not to be able to identify real deadlock situations - we have some SPIDs that are in what ought to be a deadlock situation that are not causing a deadlock
ie.
SPID 170 is blocking SPID 179
SPID 179 is blocking SPID 170
and the pages being waited on are not changing.
This to me is a classic deadlock situation and one of these two processes ought to be getting chosen as a victim.
I am in a situation where I am likely going to be forced to perform a rollback of the SQL Monitor upgrade as this is the path of least pain :
* Rollback SQL MOnitor to v 3.xxx
* Roll forward SQL to SQL 2008 R2 SP2 CU5
I would love to hear from RedGate about this one as I believe that there are quite possibly a number of their customers who are keen to get the new Wait Stats functionality available in v4 but who are also not at SQL 2008 R2 SP5 and therefore could be vulnerable to this issue.
This is the risk that we take being bleeding edge ie. installing a product update a matter of days after is is released (v4.1 of SQL Monitor was only released on June 13th and we installed on June 16th)
Look forward to a reply ASAP. In the meantime I am going to be contacting our RedGate account manager to discuss.
In the meantime, we could disable one of the metric collections. This would plain stop the queries from running in the first place. This shouldn't affect the overall Deadlock alert (so would still alert you for the remaining deadlocks), but would mean that certain data wouldn't be populated in the alerts and overviews.
If you are interested in trying the workaround, please let me know and I will email the steps for making the change as this cannot be done through the UI.
Look forward to getting the details from you.
Cheers
Joe
* Emailed (and had a response from) our RedGate account manager
* Posted a reply in this thread
* Private messaged Chris Kelly on this forum
* Updated the support ticket that was logged from this post
I am yet to have seen the promised workaround to this issue and, sure enough, the issue has caused us further production problems.
If there is anyone with access to this workaround reading this message, please forward it on to me ASAP.
Cheers
Joe
So happy but unhappy all at the same time. The workaround assisted, it was just pretty late in arriving