Cluster_SqlServer_SqlProcess_UnStableSamples is 57% of DB

franklink · February 17, 2011 11:38AM

Is this normal? I am running 2.2.

There are only 6 servers being monitored right now as this is basically a small scale test. I did not expect this table to have 11.5 million rows with only 6 servers and a one week purge policy.

I kicked off a manual purge yesterday using the hidden URL. I was hoping to see some space freed up today...

If this is by design, then I strongly suggest that the running processes data get it's own configurable retention policy. I should not have to waste terabytes of space in order to keep a long history of SQL counters and performance data.

I generally do not look closely at running processes unless something goes wrong. When something goes wrong, I am generally investigating within a few minutes. Performance data on the other hand is very useful long term for things like capacity planning.

Chris Spencer · February 17, 2011 12:13PM

Hi

Are you sure that the table you're having issues with is Cluster_SqlServer_Services_StableSamples and not Cluster_SqlServer_SqlProcess_UnstableSamples?

The Cluster_SqlServer_SqlProcess_UnstableSamples table can get very large and it is one of our top priorities to add a distinct purge policy for tables that hold SQL Process data in version 2.3.

I'm still surprised that your table contains 11.5 million rows after being left overnight. Is this row count reducing gradually or still increasing?

Regards
Chris

franklink · February 17, 2011 12:41PM

Apologies, it is [data].[Cluster_SqlServer_Process_UnstableSamples].

SELECT [utils].[Tickstodatetime](min([CollectionDate]))
FROM [RedGateMonitor].[data].[Cluster_SqlServer_Process_UnstableSamples]

returns '2011-02-09 20:26:11.063'.

What should be the oldest data I see in this table? I will implement my own purge policy for the time being.

Additionally, the [Cluster_Machine_Process_UnstableSamples] has a similar issue, though not to the degree of it's SqlServer counterpart.

Chris Spencer · February 17, 2011 1:23PM

'2011-02-09 20:26:11.063' looks old for a 7 day purge period but it would depend on the timezone of your base monitor machine. I would expect the oldest row to be no older than base monitor time - 7 days - 1 hour.

It would be interesting to know if this particular row is purged if you ran that query again. We purge every hour or when the base monitor service is restarted. The purge is done gradually in chunks rather than attempting to remove all the data at once. So it's possible that the purge is still ongoing.

Having said all that, 11.5 million rows for a 7 day purge period is probably higher than we would expect. Are your monitored servers very busy or do they contain an unusually large number of objects? (databases, tables, jobs, spids etc)

Regards
Chris

franklink · February 17, 2011 2:32PM

Sigh... apologies again. As in my post (and now my corrected title) the offending table is Cluster_SqlServer_SqlProcess_UnStableSamples,

franklink · February 17, 2011 2:40PM

SELECT PK.ParentID,
count(*)
FROM [RedGateMonitor].[data].[Cluster_SqlServer_SqlProcess_UnstableSamples] US
JOIN [RedGateMonitor].[data].[Cluster_SqlServer_SqlProcess_Keys] PK ON PK.ID = US.ID
GROUP BY [ParentId]

shows that one server is responsible for 65% of the data. They have all been monitored for the same period (over a month now), so it isn't related to length of history...

This server generally has over 300 active connections during business hours. I am assuming that explains the volume of data logged?

If that is the case, I have to say we are storing too much data for too long for this to be considered as an enterprise solution.

I will have over 300 servers to monitor...

franklink · February 18, 2011 1:18PM

No updates?

Chris Spencer · February 18, 2011 5:32PM

Hi

Sorry for the delay replying. I think that 300 active connections could very well account for the large amount of data. This kind of high activity is something we've tried to simulate on our servers for testing purposes, but there will always be surprises in store with live systems.

Is it possible that you have trace switched on for this server? That would certainly increase the amount of data - more so with busy servers.

We are aware that the way we store SQL Processes needs to improve and our developers already have some good ideas. In the meantime v2.3 will hopefully contain the extra purge option specifically for SQL Processes.

Regards
Chris

franklink · February 21, 2011 11:47AM

I have not enabled tracing on any of my servers.

I guess I will be looking forward to 2.3!

toad2898 · March 1, 2011 4:08AM

I've discovered this problem whilst testing sql monitor.

I'm only monitoring 3 servers with only a one day retention period.

This table ended up with 30 million rows @ 18Gb after a few weeks.

Problem is there's no option to purge it on the front-end, I just had to set up a SQL agent job to truncate it each night

priyasinha · March 1, 2011 4:47AM

toad2898 wrote:

I've discovered this problem whilst testing sql monitor.

I'm only monitoring 3 servers with only a one day retention period.

This table ended up with 30 million rows @ 18Gb after a few weeks.

Problem is there's no option to purge it on the front-end, I just had to set up a SQL agent job to truncate it each night

Hi,

Which version of SQL Monitor are you using? We had an issue with Purge which is fixed in 2.2 release. Do you run trace also on your SQL Servers?

Thanks,
Priya

toad2898 · March 1, 2011 5:05AM

2.1 so that'll explain it, I'll upgrade it and keep my eye on it.

Thanks

Cluster_SqlServer_SqlProcess_UnStableSamples is 57% of DB

Comments

Product Learning

Community Forums

Events & Friends

Simple Talk