Job duration unusual
JohanN
Posts: 13
Hi
It seems to me that this alert sometimes is raised when it shouldn't
I've set this alert to be ignored if run less than 60 seconds.
My settings are as follows:
Alert Threshold:
Raise this alert when a job duration deviates from its baseline duration by more than:
High 100%
Medium 80%
Low 0% (Not checked)
Ignore if the job run time is less than 60seconds
I have one example of an alert that is raised as High:
Duration: 00:00:29
Baseline duration (median of last 10 runs): 00:01:25
Deviation from baseline: 66%
The job is run less than 60 seconds and the deviation is just 66%, still its raised as a High-level alert.
Another example:
Job started at: 29 Nov 2010 4:00 AM
Job ended at: 29 Nov 2010 4:01 AM
Job outcome: Succeeded
Duration: 00:01:58
Baseline duration (median of last 10 runs): 00:02:01
Deviation from baseline: 3%
This exmple only deviated by 3%!
Have I misunderstood something?
/Johan
It seems to me that this alert sometimes is raised when it shouldn't
I've set this alert to be ignored if run less than 60 seconds.
My settings are as follows:
Alert Threshold:
Raise this alert when a job duration deviates from its baseline duration by more than:
High 100%
Medium 80%
Low 0% (Not checked)
Ignore if the job run time is less than 60seconds
I have one example of an alert that is raised as High:
Duration: 00:00:29
Baseline duration (median of last 10 runs): 00:01:25
Deviation from baseline: 66%
The job is run less than 60 seconds and the deviation is just 66%, still its raised as a High-level alert.
Another example:
Job started at: 29 Nov 2010 4:00 AM
Job ended at: 29 Nov 2010 4:01 AM
Job outcome: Succeeded
Duration: 00:01:58
Baseline duration (median of last 10 runs): 00:02:01
Deviation from baseline: 3%
This exmple only deviated by 3%!
Have I misunderstood something?
/Johan
Comments
We're aware of customer issues with this alert and have seen a couple of forum posts prior to this that seem to describe a similar issue.
http://www.red-gate.com/MessageBoard/vi ... hp?t=12244
http://www.red-gate.com/MessageBoard/vi ... hp?t=12152
We will be improving this alert for the next version of SQL Monitor. Hopefully it will be less confusing.
Thanks
Chris
Test Engineer
Red Gate
My alert settings for 'Job duration unusual' are:
high 70%, medium 60%, low 50%
all enabled
Ignore if the job run time is less than 60 seconds
Now I received a high alert with this data:
Job name: SQL Backup TL 2130 to 1830
User: <domain user here>
Job started at: 9 Mar 2011 7:30 AM
Job ended at: 9 Mar 2011 7:30 AM
Job outcome: Succeeded
Duration: 00:00:08
Baseline duration (median of last 10 runs): 00:01:10
Deviation from baseline: -89%
Job next scheduled to run at: 9 Mar 2011 10:30 AM
As you can see duration is (much) less than 60 seconds, so no alert should have been raised at first place I guess. -89% is interpreted as 89% deviation, which is OK (I already posted a feature request to be able to exclude negative deviations)
Thanks for your post. I will investigate this and get back to you.
Regards,
Priya
Project Manager
Red Gate Software
Would it be sensible to split this alert into two?
1. Jobs running a lot longer than expected
2. Jobs running a lot less than expected
Thanks, John
The Job Duration Unusual alert operates in two stages:
Firstly we calculate the baseline (using the median of the last 10 runs). If this is less than the short running time threshold, the alert is disabled until the baseline increases again.
Once the job is known to take a while, we compare each run against the baseline and alert if it deviates too far away from it.
The default configuration thus makes the assumption that a quick running job can be ignored, and once a job is known to take a significant time, it shouldn’t start running quickly or slowly.
Ignoring short running jobs is a powerful way to set defaults for a whole server, but it isn’t that useful to be set on an individual job. You know how long the job normally takes, and so can enable or disable alerting on it to taste.
The ‘ignore if runtime is less than…’ doesn’t work for the use case where you have a job that (say) copies any available new log files, which will run in less than a second if there is nothing to do but takes several minutes every now and then. We didn’t want to try and automagically figure out what is happening and this case and risk getting it wrong.
Hope it explains the behaviour you both are noticing.
Thanks,
Priya
Project Manager
Red Gate Software