Need to EXTEND the connection retry config
PDinCA
Posts: 642 Silver 1
How do I make the number of retries/retry-duration MUCH longer....?
Using 6.0.12.5986, server reboots OFTEN leave us with SQL Monitor declaring it cannot connect to the server. Patently untrue - it just gave up FAR too early!
Have you Red Gate chaps never seen a Windows Server 2012 R2 reboot take ten or fifteen minutes, or even longer, because of the huge volume of patches applied to the box? We have no control over how long it will take to come back after Microsoft's shenanigans, so we need to massively extend the number of retries.
We don't care tuppence about the volume of network traffic these may add to. Azure box, dedicated to task, Rackspace targets, all disconnected from each other, so "just carry on connecting until WE tell you to give it a break" is what's actually needed.
Please provide some options, preferably via the actual Configuration page... but we're guessing it will be some XML file editing in the interim...
Thanks for the help.
Using 6.0.12.5986, server reboots OFTEN leave us with SQL Monitor declaring it cannot connect to the server. Patently untrue - it just gave up FAR too early!
Have you Red Gate chaps never seen a Windows Server 2012 R2 reboot take ten or fifteen minutes, or even longer, because of the huge volume of patches applied to the box? We have no control over how long it will take to come back after Microsoft's shenanigans, so we need to massively extend the number of retries.
We don't care tuppence about the volume of network traffic these may add to. Azure box, dedicated to task, Rackspace targets, all disconnected from each other, so "just carry on connecting until WE tell you to give it a break" is what's actually needed.
Please provide some options, preferably via the actual Configuration page... but we're guessing it will be some XML file editing in the interim...
Thanks for the help.
Jesus Christ: Lunatic, liar or Lord?
Decide wisely...
Decide wisely...
Comments
In cases of unreachable servers, SQL Monitor should try to continue sampling on the same schedule and just update the connection state once it starts getting responses again. The only time it's set to back-off or give up entirely is when there's an authentication error.
Do you see the server and all its instances report the connection failure, or is it just the instances, or just the server? A screenshot of the monitored entities page while this issue is happening would be helpful.
Next time it happens, will lookup this ticket.
This was just 2 of 19 that had problems, and it's a 1::1 server::instance configuration. One was a password issue, the other just gave up without messages, error log, etc. and it was the windows connection as well as the SQL connection that were down. Remote Registry service was still running on the target, which is the usual problem - that service, on other boxes, has taken a lunch break without cause. Restarting the remote service fixed that one, but this time it wasn't stopped at all. Although the instance was showing stopped, clicking the retry connection sparked it back to life, which was odd, and an entirely unexpected action to have to take, hence this ticket.
Decide wisely...