Long Running Query - Variable Run Durations

Hello,
We have a situation in our organisation where several SQL Agent Jobs are setup to run every night. These queries check for data in our databases and then will process the data and send emails.  Because the operation of these jobs depends on data being generated on a day to day basis sometimes the jobs only run for a few seconds and other times they can run for 3-4minutes.

When Redgate Monitor checks for long running queries it seems to be finding a baseline average of the job duration by querying the last 10 runs from sysjobs. Because our jobs fluctuate between 0 and 3-4mins this skews the average time and throws more alerts than we like.

Is there anyway to exclude runs that were less than set time such as 1 second?

I have tried to create a custom alert to replace the standard one but am unsure how to deploy it if the result is only a single value to define the Low/Medium/High Alert:

The following code is an example which returns the Job Name, Average Duration, LastRunDuration and Flag Value

WITH JobRuns AS (
    SELECT 
        j.job_id,
        j.name AS JobName,
        h.run_duration,
        ROW_NUMBER() OVER (PARTITION BY j.job_id ORDER BY h.run_date DESC, h.run_time DESC) AS RunRank
    FROM msdb.dbo.sysjobs j
    JOIN msdb.dbo.sysjobhistory h
        ON j.job_id = h.job_id
    WHERE h.step_id = 0 -- Ensures we only get the job outcome, not individual steps
AND h.run_duration > 5
),
Last10Runs AS (
    SELECT 
        job_id,
        JobName,
        run_duration
    FROM JobRuns
    WHERE RunRank <= 10
),
AverageDurations AS (
    SELECT 
        job_id,
        JobName,
        AVG(run_duration) AS AvgDuration
    FROM Last10Runs
    GROUP BY job_id, JobName
),
LatestRun AS (
    SELECT 
        job_id,
        JobName,
        run_duration AS LatestDuration
    FROM JobRuns
    WHERE RunRank = 1
)
SELECT 
    l.JobName,
    a.AvgDuration,
    l.LatestDuration,
    CASE 
        WHEN l.LatestDuration > a.AvgDuration * 2 
             OR l.LatestDuration < a.AvgDuration * 0.5 THEN 2
        WHEN l.LatestDuration > a.AvgDuration * 1.5 
             OR l.LatestDuration < a.AvgDuration * 0.75 THEN 1
        ELSE 0
    END AS IsDurationUnusual
FROM LatestRun l
JOIN AverageDurations a
    ON l.job_id = a.job_id
ORDER BY l.JobName;

Can this be turned into a custom alert to tell us which Job Name is running as an unusual duration?

Answers

  • This is an example of data from sysjobs currently:

    Job Name RunDate RunTime RunDuration
    Test Report 01 20241127 23000 0
    Test Report 01 20241126 23000 0
    Test Report 01 20241125 23000 316
    Test Report 01 20241124 23000 313
    Test Report 01 20241123 23000 0
    Test Report 01 20241122 23000 301
    Test Report 01 20241121 23000 0
    Test Report 01 20241120 23000 0
    Test Report 01 20241119 23000 0
    Test Report 01 20241118 23000 309
  • this is the problem with custom metrics -- you can only return an integer value and that is then used to raise the alert -- what we have done is to post a query (sometime EXEC to a stored proc) that will allow to see the details.  We the powershell module to write to write the alert details as JSON in a folder, then import and run the SP and email the better details -- prototype of this is working, but it is a good deal to configure.

    It appears that custom metrics run inside of a transaction or in read-only mode, can't remember what we found when watching via SQL Events.  At any rate, table we created to store details never got the data, but when you ran via SSMS (even using the same account as Monitor) the inserts work fine.  This is actually probably a good secure feature to avoid bad SQL or worse SQL Injection or something, but still frustrating that custom metrics cannot return a string value.
Sign In or Register to comment.