We are abandoning SQL monitor for now
qcjims
Posts: 38
so, we are going to abandon our sql monitor "experiment"
the reasons:
1. too many different errors cause monitoring to stop and not automatically restart
2. no clear information on how to fix specific problems that cause the monitoring errors. even though you say the errors are detailed and specific, the reality is that they are not - they are vague and non-descriptive.
3. I do not have 3+ hours to dedicate to troubleshooting every single error we get. I am too busy and do not have the time to work the dozens of different problems we are encountering.
4. product does not scale well. we have 200+ physical servers to monitor and this product just cannot do it
5. We do not have any problems with other monitoring tools - these include MOM 2005, SCOM 2007 R2, quest spotlight, and quest foglight PA. We only have major problems with SQL monitor. It is just too much work to get this product working. it is definitely not production ready for a large installation like ours.
-jim
the reasons:
1. too many different errors cause monitoring to stop and not automatically restart
2. no clear information on how to fix specific problems that cause the monitoring errors. even though you say the errors are detailed and specific, the reality is that they are not - they are vague and non-descriptive.
3. I do not have 3+ hours to dedicate to troubleshooting every single error we get. I am too busy and do not have the time to work the dozens of different problems we are encountering.
4. product does not scale well. we have 200+ physical servers to monitor and this product just cannot do it
5. We do not have any problems with other monitoring tools - these include MOM 2005, SCOM 2007 R2, quest spotlight, and quest foglight PA. We only have major problems with SQL monitor. It is just too much work to get this product working. it is definitely not production ready for a large installation like ours.
-jim
Comments
Apologies that you've been having problems with SQL Monitor. Are there any specific issues that our support people could look in to? Would it help if one of our support team gave you a call to go through the setup?
Regards
Ben Rees
I agree with you on the connectivity issues and the vague descriptions.
For now we will stick to using SCOM 2007 R2 (ughh), Quest foglight and spotlight, and our own custom scripts to handle our monitoring tasks.
Why can't someone get basic monitoring for windows right? I just don't get it.
-jim
If you're using SQL Monitor on a large number of servers (for example, the 250+ that you mention), we certainly recommend you split your deployment in to a number of different Base Monitors. I.e. that you don't try to monitor all servers with a single Base Monitor service. Generally, we recommend around 40-50 monitored servers per deployment. Apologies that this wasn't made clearer to you earlier.
In a future version we have plans to allow this sort of number to be covered by a single deployment via a number of remote proxies, but this is not on the roadmap for the next few months (monitoring 250+ servers presents a number of issues, not least how to display so many servers on one screen!).
We will however have a version 2.2 out very soon (weeks, not months) that will provide the monitoring alerts (i.e. tell you when the system isn't monitoring) that I hope will have helped fix some of your problems. There are a lot of times, for example, where the Remote Registry service will fail on a monitored machine (for reasons outside our control obviously, but enough to prevent us from collecting certain data) and we will now send an alert to the user to let them know that they need to go and investigate this. Note also that if a problem fixes itself (e.g. the Remote Registry comes back up) the alert will be marked as Ended so that you know it's no longer a problem.
On a separate note, we've also tried to be clear in the software when there are hiccups in monitoring. We collect a great deal of information from servers in a very low impact way, but one of the consequences of this is that we will occasionally miss a packet. E.g. we might collect CPU usage data every 15 seconds but miss one of these (for whatever reason) so that we've only collected this data 3 times in a minute instead of 4. In this case we will briefly show "Bad Data" to the user, and record something in the log, even though the missed bit of data is pretty inconsequential. Nevertheless we thought it useful to leave an audit trail so that the user can look in to any of these issues if he/she wishes.
Again, hope that helps - and feel free to contact our support guys if you want any help with set up.
Ben