2.2 Monitoring Stopped - Hosted Environment - Need Fix

PDinCAPDinCA Posts: 642 Silver 1
edited May 4, 2011 7:50AM in SQL Monitor Previous Versions
We operate in a Rackspace hosted environment with 2 boxes: the SQL Cluster we're monitoring and a VM that hosts SQL Monitor (as it can't live on the cluster). Rackspace is a highly shared infrastructure environment with multiple domain controllers that occasionally need maintenance.

During just such a maintenance period, SQL Monitor completely stopped monitoring:
Monitoring stopped (host machine credentials) 308186-ntclus.iad.intensive.int

Machine authentication failed at: 17 Apr 2011 12:24:16 AM
Machine authentication successful at: 17 Apr 2011 6:56:19 PM

Explanation from SQL Monitor:
Raised when monitoring stops because the user name or password you entered for SQL Monitor to connect to your host machine fails authentication. Check whether:

•Your user name or password have changed.
•Your permissions have changed and are no longer sufficient.
WE HAVE NOT CHANGED THE PASSWORD AT ALL.

I LOGGED ON AND APPLIED THE SAME PASSWORD AS ALWAYS AND MONITORING WAS RESTORED.

Rackspace said:
In this particular ticket, the issue happened during our monthly maintenance window when the domain controllers were rebooted. However even though this was the case there were still other domain controllers that could have taken on this traffic.
It is clear we cannot fix Rackspace! However, it seems that the problem shouldn't be so severe that monitoring is completely shut down even after the outage is over...

The Fix I would like to propose is:
    Don't "give up" Monitoring when machine authentication evidently is restored - this appears to be the case from the Alert's "Machine authentication successful at: 17 Apr 2011 6:56:19 PM" information. If the problem is ongoing, emit "Continuing Alert" at user-definable periods until it clears. When authentication is "good" again, restore ALL monitoring functions
without requiring User intervention.In the case above, Rackspace has a history of authentication issues that they are unable to improve upon except to charge us more $$$ for our own Active Directory for our 2 machine setup... To have SQL Monitor recognise a problem, recognize it is no longer a problem but then sit there and do nothing until I apply the same password as is already applied to an account with a never-expire password is not user friendly and seems like something that doesn't need to occur... I may be wrong... If I'm not wrong, could you fix it, please?
Jesus Christ: Lunatic, liar or Lord?
Decide wisely...

Comments

  • Hi PDinCA,

    We're currently looking at this issue and trying to find the very narrow balance between being a reliable fault tolerant monitoring system and a good network and security citizen.

    The behaviour your describing what SQL Response 1.x would do when it encountered a security issue. It would just retry constantly. One of our customer didn't notice that the password on the domain account that SQL Response was using had expired. SQL Response merrily tried to connect over and over again. This customer also had and intrusion detection system set to email them if someone was trying to hack their systems. Each failed login generated multiple emails which eventually acted as a denial of service attack against there internal email server :shock:. There email server fell over. The customer was less than pleased.

    Because of this, we opted to take the opposite view for SQL Response 2.x. We made the assumption (naively) that customers would have redundant domain controllers and when Windows and SQL Server tell us that credentials are invalid they really are invalid. SQL Response currently halts monitoring when it receives an authentication error as we assumed that this couldn't correct it self without human intervention. We also don't like the idea of taking out peoples' email server :wink:.

    We've had a few reports of this recently and we are working some sort of configuration option so people can disable this behaviour. We will probably revisit this in a future version of SQL Monitor.

    Thanks for your feedback and we will let you know when we have a work around for this.
    --
    Daniel KJ
Sign In or Register to comment.