Why do false positive high alerts get raised?

After posting a separate question and checking with my BU, I discovered we do have upgraded licenses available for 11.x

After installing the latest 11.x and messing with the monitored servers, two of them fired off high level alerts saying there was insufficient disk space on production boxes.

When I check the message, the details indicate otherwise.  When I check the physical machines, it shows plenty of space.

This happened all the time with 10.x when patching boxes.  A box would get restarted and the on-duty person would get a page in PagerDuty.

I don't think a user voice entry is needed.  This looks like a bug.
Tagged:

Answers

  • Jessica RJessica R Posts: 1,319 Rose Gold 4
    Hi @Custom24,

    Sorry to hear you're getting unexpected alerts!

    Can you please share a screenshot of the Alert details for one of the alerts that has fired incorrectly and, if possible, a screenshot of what you have shown for the disk when you check?  Can you also please let me know what OS version the machine is where the disk is located and whether it is a physical disk or a mapped drive or anything else that is a non-standard disk.

    Thank you!

    Jessica Ramos | Product Support Engineer | Redgate Software

    Have you visited our Help Center?


  • Custom24Custom24 Posts: 7 New member
    I can also send these when we do our next patching session.  The disk space errors flow freely as servers are rebooting for patches.

    On the topic of patching -- the maintenance feature in SQL Monitor does not appear to support relative dates.  Or I am missing something.

    For example, Microsoft has Patch Tuesday.  This is traditionally the second Tuesday of each month.

    Our team applies patches to SQL Servers not on the following day, but about a week later.

    I don't see a way to set up a relative date.  I've used Powershell in the past to do certain tasks where a check was made for an upcoming date and it works like a charm.

    Is there any documentation you can point me to?

    I'd like to be able to calculate a relative date in SQL Monitor for a maintenance window w/o having to set it manually each time.

    TIA!
  • Custom24Custom24 Posts: 7 New member
    This happened again.

    Providing screen caps.


    Corresponding PagerDuty alert.  FWIW, we also have email and Teams as a notification and get messages there.

    No server was being rebooted.  By the time it was checked by the on-duty SRE (a minute later), it was evident that this was another hiccup.


  • Alex BAlex B Posts: 1,158 Diamond 4
    Hi @Custom24,

    For the Alert Suppression window, you should be able to set it to recur e.g. every 3rd Tuesday of the month (a week after the 2nd Tuesday) like this - unless I'm misunderstanding something:


    For the Disk space alert - it looks like you have customized the alert at some point since it's a high severity alert (the default is a medium severity).  Can you check to see at what levels you have the alert configured?  This can be done down to the individual disk level and the more specific levels take precedence over the more general levels and often times it will have been modified at a lower level and then changes made at the e.g. All Servers level don't override that for the more specific levels.

    You can tell this by a number in parentheses after a specific level and then when that level is selected the "Inherited from" column says <This level>.  Here you can see I have 8 alert customized at the All servers level and then I have customized the Disk space alert at the bm1group level as well as the ps-alexb2 level and also at the specific C: disk level:


    The alert itself isn't an "Insufficient space" alert so much as "Disk space is more than X% or less than Y GB" alert depending on what you have it set to.

    You can also find this information in a csv in the log files from Configuration > Retrieve all log files and then clicking into the base monitor sub folder of the SqlMonitorLogs.zip file (one there is one for each base monitor) to e.g. SqlMonitorLogs.zip\localhost\RepositoryInformation and opening the AlertConfiguration.csv file - here you can see as I described above 8 alerts at the root (All servers) level and then the Disk space alert at the group, server and disk levels:


    This is an easy way to tell if it's customized at a lower level, but the Configuration XML is a bit harder to read so navigating to that level in the UI is easier to see how it is actually configured.

    If that doesn't seem to be the issue, then we can reach out from a support ticket to get the log files and potentially some further information as well.

    Kind regards,
    Alex
    Product Support Engineer | Redgate Software

    Have you visited our Help Center?
  • Custom24Custom24 Posts: 7 New member
    Hi @Alex B,

    I took a look at our config for disk space alert.

    Screen cap for the default "all servers" is shown.  This is inherited by the few servers we have under SQL Monitor -- except for one which is at 10 GB instead of 3 GB.

    In any event, neither of the rules is sufficient to trigger the alerts.  In other words, these should not be firing since as the screen cap I posted previously shows, there is plenty of disk space.

    Moreover, the alerts fire as I mentioned during reboots.  This looks like a bug to me; it's like the WMI is requesting info from a server during a time when it is rebooting and treating the non-response as a zero or something.  Wild guess on my part.



    I'll take a look at the maintenance window.  Thanks for pointing this out!
  • Alex BAlex B Posts: 1,158 Diamond 4
    Hi @Custom24

    I'm not positive then if the disks where you're seeing it are inheriting that setting then I'd also agree it shouldn't be raising the alert.  

    I am also going to reach out via a support ticket to get the log files as well as ask a few other questions.

    Kind regards,
    Alex
    Product Support Engineer | Redgate Software

    Have you visited our Help Center?
Sign In or Register to comment.