Job duration unusual

JohanNJohanN Posts: 13
Hi

It seems to me that this alert sometimes is raised when it shouldn't

I've set this alert to be ignored if run less than 60 seconds.

My settings are as follows:

Alert Threshold:
Raise this alert when a job duration deviates from its baseline duration by more than:

High 100%
Medium 80%
Low 0% (Not checked)

Ignore if the job run time is less than 60seconds

I have one example of an alert that is raised as High:

Duration: 00:00:29

Baseline duration (median of last 10 runs): 00:01:25
Deviation from baseline: 66%

The job is run less than 60 seconds and the deviation is just 66%, still its raised as a High-level alert.

Another example:
Job started at: 29 Nov 2010 4:00 AM
Job ended at: 29 Nov 2010 4:01 AM
Job outcome: Succeeded

Duration: 00:01:58

Baseline duration (median of last 10 runs): 00:02:01
Deviation from baseline: 3%

This exmple only deviated by 3%!

Have I misunderstood something?

/Johan

Comments

  • Hi Johan

    We're aware of customer issues with this alert and have seen a couple of forum posts prior to this that seem to describe a similar issue.

    http://www.red-gate.com/MessageBoard/vi ... hp?t=12244

    http://www.red-gate.com/MessageBoard/vi ... hp?t=12152

    We will be improving this alert for the next version of SQL Monitor. Hopefully it will be less confusing.

    Thanks
    Chris
    Chris Spencer
    Test Engineer
    Red Gate
  • oderksoderks Posts: 67 Bronze 2
    We're running 2.2.0.260 and this seems to still occur.

    My alert settings for 'Job duration unusual' are:

    high 70%, medium 60%, low 50%
    all enabled
    Ignore if the job run time is less than 60 seconds

    Now I received a high alert with this data:

    Job name: SQL Backup TL 2130 to 1830
    User: <domain user here>
    Job started at: 9 Mar 2011 7:30 AM
    Job ended at: 9 Mar 2011 7:30 AM
    Job outcome: Succeeded
    Duration: 00:00:08
    Baseline duration (median of last 10 runs): 00:01:10
    Deviation from baseline: -89%
    Job next scheduled to run at: 9 Mar 2011 10:30 AM

    As you can see duration is (much) less than 60 seconds, so no alert should have been raised at first place I guess. -89% is interpreted as 89% deviation, which is OK (I already posted a feature request to be able to exclude negative deviations)
  • Hi,

    Thanks for your post. I will investigate this and get back to you.

    Regards,
    Priya
    Priya Sinha
    Project Manager
    Red Gate Software
  • I also have this problem. We have many jobs that run every few minutes but do not always need to do anything, so SQL Monitor flags an Alert for them because they are X% below average. The "ignore if under so many seconds" does not seem to work as alerts are still raised - so I have simply disabled the alert for now!

    Would it be sensible to split this alert into two?
    1. Jobs running a lot longer than expected
    2. Jobs running a lot less than expected
    Thanks, John
  • Thanks John and Oderks for your post.

    The Job Duration Unusual alert operates in two stages:

    Firstly we calculate the baseline (using the median of the last 10 runs). If this is less than the short running time threshold, the alert is disabled until the baseline increases again.

    Once the job is known to take a while, we compare each run against the baseline and alert if it deviates too far away from it.
    The default configuration thus makes the assumption that a quick running job can be ignored, and once a job is known to take a significant time, it shouldn’t start running quickly or slowly.

    Ignoring short running jobs is a powerful way to set defaults for a whole server, but it isn’t that useful to be set on an individual job. You know how long the job normally takes, and so can enable or disable alerting on it to taste.

    The ‘ignore if runtime is less than…’ doesn’t work for the use case where you have a job that (say) copies any available new log files, which will run in less than a second if there is nothing to do but takes several minutes every now and then. We didn’t want to try and automagically figure out what is happening and this case and risk getting it wrong.

    Hope it explains the behaviour you both are noticing.

    Thanks,
    Priya
    Priya Sinha
    Project Manager
    Red Gate Software
Sign In or Register to comment.