Options

Cluster_SqlServer_SqlProcess_UnStableSamples is 57% of DB

franklinkfranklink Posts: 11
Is this normal? I am running 2.2.

There are only 6 servers being monitored right now as this is basically a small scale test. I did not expect this table to have 11.5 million rows with only 6 servers and a one week purge policy.

I kicked off a manual purge yesterday using the hidden URL. I was hoping to see some space freed up today...

If this is by design, then I strongly suggest that the running processes data get it's own configurable retention policy. I should not have to waste terabytes of space in order to keep a long history of SQL counters and performance data.

I generally do not look closely at running processes unless something goes wrong. When something goes wrong, I am generally investigating within a few minutes. Performance data on the other hand is very useful long term for things like capacity planning.

Comments

  • Options
    Hi

    Are you sure that the table you're having issues with is Cluster_SqlServer_Services_StableSamples and not Cluster_SqlServer_SqlProcess_UnstableSamples?

    The Cluster_SqlServer_SqlProcess_UnstableSamples table can get very large and it is one of our top priorities to add a distinct purge policy for tables that hold SQL Process data in version 2.3.

    I'm still surprised that your table contains 11.5 million rows after being left overnight. Is this row count reducing gradually or still increasing?

    Regards
    Chris
    Chris Spencer
    Test Engineer
    Red Gate
  • Options
    Apologies, it is [data].[Cluster_SqlServer_Process_UnstableSamples].

    SELECT [utils].[Tickstodatetime](min([CollectionDate]))
    FROM [RedGateMonitor].[data].[Cluster_SqlServer_Process_UnstableSamples]

    returns '2011-02-09 20:26:11.063'.

    What should be the oldest data I see in this table? I will implement my own purge policy for the time being.

    Additionally, the [Cluster_Machine_Process_UnstableSamples] has a similar issue, though not to the degree of it's SqlServer counterpart.
  • Options
    '2011-02-09 20:26:11.063' looks old for a 7 day purge period but it would depend on the timezone of your base monitor machine. I would expect the oldest row to be no older than base monitor time - 7 days - 1 hour.

    It would be interesting to know if this particular row is purged if you ran that query again. We purge every hour or when the base monitor service is restarted. The purge is done gradually in chunks rather than attempting to remove all the data at once. So it's possible that the purge is still ongoing.

    Having said all that, 11.5 million rows for a 7 day purge period is probably higher than we would expect. Are your monitored servers very busy or do they contain an unusually large number of objects? (databases, tables, jobs, spids etc)

    Regards
    Chris
    Chris Spencer
    Test Engineer
    Red Gate
  • Options
    Sigh... apologies again. As in my post (and now my corrected title) the offending table is Cluster_SqlServer_SqlProcess_UnStableSamples,
  • Options
    SELECT PK.ParentID,
    count(*)
    FROM [RedGateMonitor].[data].[Cluster_SqlServer_SqlProcess_UnstableSamples] US
    JOIN [RedGateMonitor].[data].[Cluster_SqlServer_SqlProcess_Keys] PK ON PK.ID = US.ID
    GROUP BY [ParentId]

    shows that one server is responsible for 65% of the data. They have all been monitored for the same period (over a month now), so it isn't related to length of history...

    This server generally has over 300 active connections during business hours. I am assuming that explains the volume of data logged?

    If that is the case, I have to say we are storing too much data for too long for this to be considered as an enterprise solution.

    I will have over 300 servers to monitor...
  • Options
    No updates?
  • Options
    Hi

    Sorry for the delay replying. I think that 300 active connections could very well account for the large amount of data. This kind of high activity is something we've tried to simulate on our servers for testing purposes, but there will always be surprises in store with live systems.

    Is it possible that you have trace switched on for this server? That would certainly increase the amount of data - more so with busy servers.

    We are aware that the way we store SQL Processes needs to improve and our developers already have some good ideas. In the meantime v2.3 will hopefully contain the extra purge option specifically for SQL Processes.

    Regards
    Chris
    Chris Spencer
    Test Engineer
    Red Gate
  • Options
    I have not enabled tracing on any of my servers.

    I guess I will be looking forward to 2.3!
  • Options
    I've discovered this problem whilst testing sql monitor.

    I'm only monitoring 3 servers with only a one day retention period.

    This table ended up with 30 million rows @ 18Gb after a few weeks.

    Problem is there's no option to purge it on the front-end, I just had to set up a SQL agent job to truncate it each night
  • Options
    toad2898 wrote:
    I've discovered this problem whilst testing sql monitor.

    I'm only monitoring 3 servers with only a one day retention period.

    This table ended up with 30 million rows @ 18Gb after a few weeks.

    Problem is there's no option to purge it on the front-end, I just had to set up a SQL agent job to truncate it each night

    Hi,

    Which version of SQL Monitor are you using? We had an issue with Purge which is fixed in 2.2 release. Do you run trace also on your SQL Servers?

    Thanks,
    Priya
    Priya Sinha
    Project Manager
    Red Gate Software
  • Options
    2.1 so that'll explain it, I'll upgrade it and keep my eye on it.

    Thanks
Sign In or Register to comment.