Options

Issues with SQL Monitor appearing to self block itself post

JoeGTJoeGT Posts: 50
edited June 26, 2014 12:08AM in SQL Monitor Previous Versions
I have just performed an upgrade in one of my production environments from v3.4.1.4 to v4.1.0.2226 and I am seeing this strange behaviour.

It would appear that some of the processes that SQL Monitor is launching are blocking other SQL Monitor processes. These do clear themselves after 20 seconds but they are filling my my alert window with what I have now deemed "false" alerts (I dont want to switch off the "Blocked Process" alert as I need it for other reasons)

The output from one of these events is below. Sensitive data has been obfuscated :



Blocked process: SQL Monitor - Monitoring
Process ID: 57
Database: tempdb
Host: xxxxx
User: xxxxx
Start time: 16 Jun 2014 2:32 AM
SQL fragment: USE [tempdb] ;
SET NOCOUNT ON ;

IF OBJECT_ID(N'tempdb..[##redgate_sqlmonitor_querywaitstats_xxxxxxx]') IS NOT NULL
BEGIN
SELECT database_name ,
sql_handle ,
statement_start_offset ,
statement_end_offset ,
wait_type ,
SUM(wait_time_delta) wait_time,
COUNT(wait_time_delta) wait_time_count
FROM [##redgate_sqlmonitor_querywaitstats_xxxxxxxx]
GROUP BY database_name ,
sql_handle ,
statement_start_offset ,
statement_end_offset ,
wait_type

TRUNCATE TABLE [##redgate_sqlmonitor_querywaitstats_xxxxxx]

UPDATE [##redgate_sqlmonitor_querywaitstats_info_.xxxxxxx] SET last_sampled = GETUTCDATE()
END


Blocking process: Unknown
Process ID: 77
Database:
Host:
User: Unknown
Start time: 16 Jun 2014 2:11 AM
SQL fragment: --RedGateIgnore
USE [tempdb] ;
SET NOCOUNT ON ;

IF OBJECT_ID(N'tempdb..[##redgate_sqlmonitor_querywaitstats_xxxx]') IS NOT NULL
BEGIN
DROP TABLE [##redgate_sqlmonitor_querywaitstats_xxxxxx]
END

CREATE TABLE [##redgate_sqlmonitor_querywaitstats_xxxxx]
(
database_name VARCHAR(50) NOT NULL,
sql_handle VARBINARY(64) NOT NULL,
statement_start_offset BIGINT NOT NULL,
statement_end_offset BIGINT NOT NULL,
wait_type NVARCHAR(60) NOT NULL,
wait_time_delta INT NOT NULL
)

CREATE INDEX [##redgate_sqlmonitor_querywaitstats_grouping_xxxxxxxxxxxxx] ON [##redgate_sqlmonitor_querywaitstats_xxxxxxxxxxxxx] (database_name, sql_handle, statement_start_offset, statement_end_offset, wait_type)

DECLARE @timestamp DATETIME
SET @timestamp = GETUTCDATE()

IF OBJECT_ID(N'tempdb..[##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx]') IS NOT NULL
BEGIN
DROP TABLE [##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx]
END

CREATE TABLE [##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx]
(
last_sampled DATETIME
)
INSERT INTO [##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx] (last_sampled) VALUES (NULL)

DECLARE @start DATETIME
SET @start = GETUTCDATE()

DECLARE @last_sampled DATETIME
SET @last_sampled = NULL

WHILE ( (@last_sampled IS NULL) AND (DATEDIFF(s, @start, @timestamp) < 300) )
BEGIN

DECLARE @delta INT
SET @delta = DATEDIFF(ms, @timestamp, GETUTCDATE())

INSERT INTO [##redgate_sqlmonitor_querywaitstats_xxxxxxxxxxxxx]
SELECT database_name ,
sql_handle ,
statement_start_offset ,
statement_end_offset ,
wait_type ,
wait_time_delta FROM (
SELECT
DB_NAME(er.database_id) [database_name],
er.sql_handle,
er.statement_start_offset,
er.statement_end_offset,
ISNULL(wt.wait_type, CASE WHEN er.[status] = 'runnable' then er.last_wait_type ELSE NULL END) [wait_type] ,
@delta wait_time_delta
FROM sys.dm_exec_requests er
LEFT JOIN sys.dm_os_waiting_tasks wt
ON er.session_id = wt.session_id
) s
WHERE
s.wait_type IS NOT NULL
AND
s.sql_handle IS NOT NULL
AND s.[wait_type] NOT IN (N'BROKER_EVENTHANDLER',N'BROKER_INIT',N'BROKER_MASTERSTART',N'BROKER_RECEIVE_WAITFOR',
N'BROKER_REGISTERALLENDPOINTS',N'BROKER_SERVICE',N'BROKER_SHUTDOWN',N'BROKER_TASK_STOP',
N'BROKER_TO_FLUSH',N'BROKER_TRANSMITTER',N'CHECKPOINT_QUEUE',N'CHKPT',N'CLR_AUTO_EVENT',
N'CLR_MANUAL_EVENT',N'CLR_SEMAPHORE',N'DIRTY_PAGE_POLL',N'DISPATCHER_QUEUE_SEMAPHORE',
N'FT_IFTS_SCHEDULER_IDLE_WAIT',N'FT_IFTSHC_MUTEX',N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
N'KSOURCE_WAKEUP',N'LAZYWRITER_SLEEP',N'LOGMGR_QUEUE',N'MISCELLANEOUS',N'OGMGR_QUEUE',
N'PARALLEL_BACKUP_QUEUE',N'REQUEST_DISPENSER_PAUSE',N'REQUEST_FOR_DEADLOCK_SEARCH',
N'RESOURCE_QUEUE',N'SLEEP_BPOOL_FLUSH',N'SLEEP_DBSTARTUP',N'SLEEP_DCOMSTARTUP',
N'SLEEP_MSDBSTARTUP',N'SLEEP_SYSTEMTASK',N'SLEEP_TASK',N'SLEEP_TEMPDBSTARTUP',
N'SP_SERVER_DIAGNOSTICS_SLEEP',N'SQLTRACE_BUFFER_FLUSH',N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
N'TRACEWRITE',N'WAITFOR',N'XE_DISPATCHER_JOI',N'XE_DISPATCHER_WAIT',N'XE_TIMER_EVENT')

SET @timestamp = GETUTCDATE()

WAITFOR DELAY '00:00:00.200'

SELECT @last_sampled = last_sampled FROM [##redgate_sqlmonitor_querywaitstats_info_xxxxxxxxxxxxx]
END

SELECT CAST (CASE WHEN @last_sampled IS NULL THEN 0 ELSE 1 END AS BIT)


This environment is relatively "busy" and in the last 2 hours I have seen over 200 blocking events (all of which look like the above from the ones I have sampled). The issue is only affecting some of the SQL instance in this environment

Is this something that any others have experienced or that RedGate are aware of ?

A little more background - all of the SQL instances being monitored are running at SQL 2008 R2 SP2 CU3.

I have also noticed (running sp_whoisactive) that all the instances being affected have long running SPIDs with the wait type :

DEADLOCK_ENUM_MUTEX

The ones not affected by the issue do NOT have any SPIDs active with this wait type.

This is something we have encountered that is mentioned in this post http://support.microsoft.com/kb/972936/en-us . We are working on implementing the CU that resolves this issue (going through all the normal hoops/signoff/validation etc etc to push to production) but in the meantime I would like to know if this is going to be the root cause of this SQL Monitor and if so what options I have (I realize that by restarting the SQL instance I can remove the waiting SPIDs however there is every chance that other processes we run will regenerate more instances of this type of waiting SPID in the future.)

Look forward to feedback.

Cheers

Joe

Comments

  • Options
    As another follow up to this I can confirm that having restarted all of the affected SQL instances (thus removing the DEADLOCK_ENUM_MUTEX SPID), I no longer has SQL Monitor causing blocking events.

    Since that point however I can see that SQL Monitor itself as now been the cause of another of these DEADLOCK_ENUM_MUTEX SPIDS


    Capturing the query that is running at the time it is this :

    <?query --
    INSERT INTO [##redgate_sqlmonitor_querywaitstats_xxxxxxx]
    SELECT database_name ,
    sql_handle ,
    statement_start_offset ,
    statement_end_offset ,
    wait_type ,
    wait_time_delta FROM (
    SELECT
    DB_NAME(er.database_id) [database_name],
    er.sql_handle,
    er.statement_start_offset,
    er.statement_end_offset,
    ISNULL(wt.wait_type, CASE WHEN er.[status] = 'runnable' then er.last_wait_type ELSE NULL END) [wait_type] ,
    @delta wait_time_delta
    FROM sys.dm_exec_requests er
    LEFT JOIN sys.dm_os_waiting_tasks wt
    ON er.session_id = wt.session_id
    ) s
    WHERE
    s.wait_type IS NOT NULL
    AND
    s.sql_handle IS NOT NULL
    AND s.[wait_type] NOT IN (N'BROKER_EVENTHANDLER',N'BROKER_INIT',N'BROKER_MASTERSTART',N'BROKER_RECEIVE_WAITFOR',N'BROKER_REGISTERALLENDPOINTS',
    N'BROKER_SERVICE',N'BROKER_SHUTDOWN',N'BROKER_TASK_STOP',N'BROKER_TO_FLUSH',N'BROKER_TRANSMITTER',N'CHECKPOINT_QUEUE',N'CHKPT',
    N'CLR_AUTO_EVENT',N'CLR_MANUAL_EVENT',N'CLR_SEMAPHORE',N'DIRTY_PAGE_POLL',N'DISPATCHER_QUEUE_SEMAPHORE',N'FT_IFTS_SCHEDULER_IDLE_WAIT',
    N'FT_IFTSHC_MUTEX',N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',N'KSOURCE_WAKEUP',N'LAZYWRITER_SLEEP',N'LOGMGR_QUEUE',N'MISCELLANEOUS',
    N'OGMGR_QUEUE',N'PARALLEL_BACKUP_QUEUE',N'REQUEST_DISPENSER_PAUSE',N'REQUEST_FOR_DEADLOCK_SEARCH',N'RESOURCE_QUEUE',
    N'SLEEP_BPOOL_FLUSH',N'SLEEP_DBSTARTUP',N'SLEEP_DCOMSTARTUP',N'SLEEP_MSDBSTARTUP',N'SLEEP_SYSTEMTASK',N'SLEEP_TASK',
    N'SLEEP_TEMPDBSTARTUP',N'SP_SERVER_DIAGNOSTICS_SLEEP',N'SQLTRACE_BUFFER_FLUSH',N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
    N'TRACEWRITE',N'WAITFOR',N'XE_DISPATCHER_JOI',N'XE_DISPATCHER_WAIT',N'XE_TIMER_EVENT')


    --?>


    And the result is the the SPID running this query is now showing with a wait (and wait time) of :

    (184635ms)DEADLOCK_ENUM_MUTEX

    And is blocking another SQL Monitor Process :

    <?query --
    TRUNCATE TABLE [##redgate_sqlmonitor_querywaitstats_xxxxx]


    --?>

    With wait (and wait time) :

    (1609ms)LCK_M_SCH_M



    So.. I have kind of answered my own question - I probably need to update to SQL 2008 R2 SP5 to solve the underlying SQL issue. However, as I am not in a position to be able to do this anytime soon, is there any interim solution that RedGate(or someone else enterprising) might be able to offer ? Note: We specifically updated to this version of SQL Monitor to allow us access to the Wait information so not having this feature of the tool functioning correctly is limiting its continued usefulness.

    Cheers

    Joe
  • Options
    And now further updates that I believe might be relevant to other users of SQL Monitor.

    It is possible that the DEADLOCK_ENUM_MUTEX spid referred to above is now causing our SQL instances not to be able to identify real deadlock situations - we have some SPIDs that are in what ought to be a deadlock situation that are not causing a deadlock

    ie.

    SPID 170 is blocking SPID 179
    SPID 179 is blocking SPID 170

    and the pages being waited on are not changing.

    This to me is a classic deadlock situation and one of these two processes ought to be getting chosen as a victim.

    I am in a situation where I am likely going to be forced to perform a rollback of the SQL Monitor upgrade as this is the path of least pain :

    * Rollback SQL MOnitor to v 3.xxx
    * Roll forward SQL to SQL 2008 R2 SP2 CU5


    I would love to hear from RedGate about this one as I believe that there are quite possibly a number of their customers who are keen to get the new Wait Stats functionality available in v4 but who are also not at SQL 2008 R2 SP5 and therefore could be vulnerable to this issue.

    This is the risk that we take being bleeding edge ie. installing a product update a matter of days after is is released (v4.1 of SQL Monitor was only released on June 13th and we installed on June 16th)

    Look forward to a reply ASAP. In the meantime I am going to be contacting our RedGate account manager to discuss.
  • Options
    Upon investigating this, I believe that the best thing to do is to install the CU5 update for SQL 2008R2 SP2. Unfortunately, we haven't been able to test this, but it does look like the best approach.

    In the meantime, we could disable one of the metric collections. This would plain stop the queries from running in the first place. This shouldn't affect the overall Deadlock alert (so would still alert you for the remaining deadlocks), but would mean that certain data wouldn't be populated in the alerts and overviews.

    If you are interested in trying the workaround, please let me know and I will email the steps for making the change as this cannot be done through the UI.
  • Options
    Thanks Chris. This is what I was hoping you would be able to offer and yes I am very interested in being able to disable this specific query (happy to lose the functionality in the meantime). We are going down the path of the CU for SQL but it will take some time and in the interim this is a good compromise.

    Look forward to getting the details from you.

    Cheers

    Joe
  • Options
    A little disappointed here I have to say. Having :

    * Emailed (and had a response from) our RedGate account manager
    * Posted a reply in this thread
    * Private messaged Chris Kelly on this forum
    * Updated the support ticket that was logged from this post

    I am yet to have seen the promised workaround to this issue and, sure enough, the issue has caused us further production problems.

    If there is anyone with access to this workaround reading this message, please forward it on to me ASAP.

    Cheers

    Joe
  • Options
    I finally got what I was looking for here off the back off the ticket that was raised in the RedGate system when no-one had replied to this thread.

    So happy but unhappy all at the same time. The workaround assisted, it was just pretty late in arriving
Sign In or Register to comment.