Error in system event log on x64 cluster system

qcjimsqcjims Posts: 38
edited September 9, 2010 7:23AM in SQL Backup Previous Versions
getting the following error several times per second in the system event log of several of our x64 cluster systems after installing red-gate Bakcup 6.

Event Type: Error
Event Source: ClusSvc
Event Category: Database Mgr
Event ID: 1080
Date: 9/1/2009
Time: 10:12:36 PM
User: N/A
Computer: xxxxxxxxxxxxx
Description:
Cluster service could not write to a file (C:\DOCUME~1\MCS64\LOCALS~1\Temp\CLS623.tmp). The disk may be low on disk space, or some other serious condition exists.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 05 00 00 00 ....


I have upgraded to 6.2 from 6.0 and still have the same error. i have also clean installed and still have the same problem. The disk is not low on space, and the account the cluster service runs under (MCS64) does have access to write to that directory. Any ideas?


-js
«1

Comments

  • Hi js,

    Could you send us as much information about your cluster as possible? (OS + SP + any other updates, installed instances + version + bitness, any other software installed on the system, how you installed SQL Backup, any recent fail overs etc). We have had one other similar report but have not managed to reproduce it yet.

    Feel free to email support@red-gate.com with the information if you do not want to publish it here - plesae include the url to this page when you do so.

    Sorry for the problems you are facing with SQL Backup.

    Regards,

    James
    James Moore
    Head of DBA Tools
    Red Gate Software Ltd
  • Hi js,

    This issue may be related to a registry key replication issue.

    As a tempory workaround you can disable this feature for SQL Backup.

    For each SQL Server group In Cluster Administrator, right click ‘SQLBackupAgent_[INSTANCE]’ and select ‘properties’. Under the Registry Replication tab, remove the entries – ignoring any errors. Once registry replication is removed, the event log errors will no longer be produced. The obvious drawback to this technique is that any special reg keys you want to set will have to be manually copied over (such as BrowsingUserList, templates etc).

    On my cluster, re-entering keys for replication did not cause the problem to reoccur - though I am still testing the full behaviour.

    Please let me know how things go if you try this,

    Thanks
    Robin Anderson
    Development
    Red-Gate Software
  • I have removed the registry keys on a cluster we have and we're still seeing the error.
    K. Brian Kelley
  • By removing 'keys for replication' I specifically mean in Cluster Administrator, not in the registry.

    testtx.th.png

    Can you confirm that there are no entries here? This bug can also occur when using Microsoft Exchange on your cluster.
    Robin Anderson
    Development
    Red-Gate Software
  • I can confirm that I removed it from Cluster Administration, yes. And I confirm that the cluster is dedicated to SQL Server only.
    K. Brian Kelley
  • Might be a long shot, but have you failed over since? If not, it may still be trying to replicate the keys from before the change was made.
    Robin Anderson
    Development
    Red-Gate Software
  • Can't at this time. Heavily used production system. I'll let you know when I can, but that may be a while.
    K. Brian Kelley
  • registry replication appears to be the culprit for us. I removed the entries in cluster admin for the individual SQLBACKUP_AGENT cluster resources and the errors in the System event log have disappeared.

    If I look in the cluster.log for errors, I see the following entries (or similar). The .tmp file mentioned matches up with the error in the event logs.

    Here are the cluster.log entries:
    000010a8.00001a38::2009/09/17-23:23:21.391 WARN [CP] CppRegNotifyThread CppNotifyCheckpoint due to timer failed, reset the timer.
    000010a8.00001a38::2009/09/17-23:23:21.391 INFO [CP] CppRegNotifyThread checkpointing key Software\Wow6432Node\Red Gate\SQL Backup\BackupSettings\B to id 2 due to timer
    000010a8.00001a38::2009/09/17-23:23:21.391 INFO [Qfs] QfsGetTempFileName C:\DOCUME~1\MCS64\LOCALS~1\Temp\, CLS, 54807 => C:\DOCUME~1\MCS64\LOCALS~1\Temp\CLSD617.tmp, status 0
    000010a8.00001a38::2009/09/17-23:23:21.391 INFO [Qfs] QfsDeleteFile C:\DOCUME~1\MCS64\LOCALS~1\Temp\CLSD617.tmp, status 0
    000010a8.00001a38::2009/09/17-23:23:21.391 INFO [Qfs] QfsRegSaveKey C:\DOCUME~1\MCS64\LOCALS~1\Temp\CLSD617.tmp, status 5
    000010a8.00001a38::2009/09/17-23:23:21.391 INFO [Qfs] QfsDeleteFile C:\DOCUME~1\MCS64\LOCALS~1\Temp\CLSD617.tmp, status 0
    000010a8.00001a38::2009/09/17-23:23:21.391 WARN [CP] CppCheckpoint failed to get registry database Software\Wow6432Node\Red Gate\SQL Backup\BackupSettings\B to file C:\DOCUME~1\MCS64\LOCALS~1\Temp\CLSD617.tmp error 5
    000010a8.00001a38::2009/09/17-23:23:21.391 INFO [Qfs] QfsDeleteFile C:\DOCUME~1\MCS64\LOCALS~1\Temp\CLSD617.tmp, status 2
    000010a8.00001a38::2009/09/17-23:23:21.391 WARN [CP] CppRegNotifyThread CppNotifyCheckpoint due to timer failed, reset the timer.
    

    Here is the corresponding error in the system event log:
    Event Type:	Error
    Event Source:	ClusSvc
    Event Category:	Database Mgr 
    Event ID:	1080
    Date:		9/17/2009
    Time:		4:23:21 PM
    User:		N/A
    Computer:	XXXXXXXXXXXX
    Description:
    Cluster service could not write to a file (C:\DOCUME~1\MCS64\LOCALS~1\Temp\CLSD617.tmp). The disk may be low on disk space, or some other serious condition exists.
    
    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
    Data:
    0000: 05 00 00 00               ....
    


    When I add the replicated registry entries back using CLUADMIN, the errors do not return. However, it does not appear that registry replication is actually doing anything at this point (which might explain the lack of errors). When I attempt to make changes to these values using the SQLBackup GUI (changing smtp setting for example), the values are recorded on the node that the cluster group is running on, but the values are not replicated to the other nodes.

    I have also noticed that there are inconsistencies between the instances listed under the BackupSettings, BackupSettingsGlobal and InstalledInstances registry keys across the cluster nodes. Shouldn't all the instances listed under each key be the same across the entire cluster?

    -js
  • Hello,

    It's been explained to me that the keys aren't actually replicated unless you fail the generic service resource. I can't seem to find any Microsoft documentation about this, though.
  • Yes, MSCS replicates keys on failover (which seems slightly odd, since the previously active node might well be offline) so you won't see registry changes propagate until later.

    Any inconsistencies are probably a result of the fact that key replication breaks when this bug occurs.
    Robin Anderson
    Development
    Red-Gate Software
  • RBA wrote:
    Yes, MSCS replicates keys on failover (which seems slightly odd, since the previously active node might well be offline) so you won't see registry changes propagate until later.

    so taking the SQLBACKUP_AGENT offline and then bringing it back online is not enough? This KB article seems to indicate that registry changes are replicated when the resource goes offline. So, if i just fail the resource then it should replicate across all nodes right?

    http://support.microsoft.com/kb/174070
    Any inconsistencies are probably a result of the fact that key replication breaks when this bug occurs.

    What bug are you referring to?


    -js
  • You're right, bringing a single resource offline will also initiate key replication.
    What bug are you referring to?

    The active/active registry key replication issue. So far, only observed on 64-bit Windows 2003 multi-instance clusters. On such systems I would recommend turning off key replication by the aforementioned method whilst we investigate (logged as SB-4349)
    Robin Anderson
    Development
    Red-Gate Software
  • Is there any update to this?
  • I think I may know the cause of this, the RedGate registry keys are not specific to an instance of SQL Server, therefore running multiple clustered instances and/or active active clusters gets the cluster service very confused.

    Having removed the registry keys from replication in 1 clustered instance and then attempting to remove it from the next instance either resulted in a message telling me that it wasn't there or when the dialog box was opened, no registry keys were present.

    That's what I think it is based on my own experience so far and would probably explain why testing in an attempt to replicate the issue may have failed, the testing was attempted on 1 clustered instance of SQL Server rather than multiple instances.
  • Hi PhilJax,

    The registry keys are meant to be instance specific. If they're not in your deployment then something has gone wrong with the install.

    For each SQL Backup entry in the cluster, the two keys for replication should be:

    SOFTWARE\Wow6432Node\Red Gate\SQL Backup\BackupSettings\(local)
    SOFTWARE\Wow6432Node\Red Gate\SQL Backup\BackupSettingsGlobal\(local)

    where '(local)' would be replaced with the instance name if not default.

    What were the keys for replication on your system? Were they all set to one instance?
    Robin Anderson
    Development
    Red-Gate Software
  • Hi RBA,
    Sorry, yes, you are correct. I made an assumption based on the error in the cluster log. One thing I have noticed is that for some clusters, the data path key points to the clustered drive and in others, it's a local path to the C drive.
  • P.S. Instance names all appear to be correct for the registry keys.
  • PhilJax wrote:
    Hi RBA,
    Sorry, yes, you are correct. I made an assumption based on the error in the cluster log. One thing I have noticed is that for some clusters, the data path key points to the clustered drive and in others, it's a local path to the C drive.

    I'm sorry to hear this has affected you. I discovered this myself very recently on 2008R2. The last step of the installer dialog offers the user the chance to choose where the data is kept - but the default entry sometimes points to C:\ rather than the shared drive. This should be corrected, either at install time or manually via the registry.
    Robin Anderson
    Development
    Red-Gate Software
  • does the 6.3 release resolve this issue?
  • Hi qcjims,

    The 6.3 release does not include any changes to the key replication behaviour. I can reproduce the issue, but have not yet been able to determine the cause. If you are affected by this bug, please disable key replication for SQL Backup whilst we investigate. I'm sorry for any inconvenience this may cause.

    Regards,
    Robin Anderson
    Development
    Red-Gate Software
  • We are definitely affected by this bug. we have 30+ clustered environments with RGB installed. manually changing this on all the systems is a serious pita. Our event logs are basically completely useless because of this problem, since we run 10+ instances per cluster our logs are just completely dominated by these errors.

    Is red-gate working with Microsoft to try and determine the cause of this issue?



    -js
  • We are seeing the same errors on a 4 node cluster with 3 sql instances running on it. Is there a proper fix for this available or in the works? I am disabling key replication for now, but I'd like to know when/if a permanent fix will be available.
  • I found this post via Google search


    I just noticed this error on our Windows 2003 Enterprise R2 x64, SQL Backup 6.3.048 (so far, 2 clusters have the error and 3 don't)

    Interesting... they're all SQL 2005 Standard

    I'm gonna remove the registry replication from those failed ones
    Jerry Hung
    DBA, MCITP
  • Disabling key replication is not a good fix for us. this is because the settings for email, how many days to keep logfiles, etc. are all in the registry. if we set these settings on one node, and those settings don't propagate to the other nodes, then we run into problems if an instance of sql server is ever moved over to another clustered node.

    please fix this ASAP.
  • any idea when this will be fixed at all?
  • A fix for this problem has been discovered and will be included in the next release, due this quarter.

    We are currently undergoing release testing, however, if you would like to receive the patch early please get in touch via private message or email (robin.anderson@red-gate.com).

    Regards,
    Robin Anderson
    Development
    Red-Gate Software
  • RBA wrote:
    A fix for this problem has been discovered and will be included in the next release, due this quarter.

    We are currently undergoing release testing, however, if you would like to receive the patch early please get in touch via private message or email (robin.anderson@red-gate.com).

    Regards,

    Great news, eager to wait for the new release!!
    Jerry Hung
    DBA, MCITP
  • This has been corrected in SQL Backup 6.4
    I just upgraded all my production from 6.3 to 6.4 and Registry Replication has re-surfaced in the SQLBackupAgent

    HOWEVER............ the error continues non-stop UNTIL a restart of SQLBackupAgent (take offline, bring online), keep that in mind!!!


    I had to do this for 2 * 4 clusters we have :(
    Cluster service failed to save the registry key Software\Red Gate\SQL Backup\BackupSettingsGlobal\DBXXXX01 when a resource was brought offline. The error code was 1018. Some changes may be lost.
    Jerry Hung
    DBA, MCITP
  • Hi,

    I just upgrade a 64_bit cluser from 6.3 to 6.4. if I look in the Cluster Admin, right click 'SQLBackupAgent' amd select properties I can see 4 entries inder the Registry replication tab. 2 are for Software\WOW6432Node\Red Gate and the other 2 are Software\Red Gate.

    Which ones should I remove?

    Thanks

    Chris

    We are seeing the Event 1080 in some 64_bit clusters but not all of ours.
    English DBA living in CANADA
  • RBARBA Posts: 152 Silver 3
    Please remove the wow6432node paths from the keys for replication (ignoring any errors that may be generated when you remove them).

    If the Microsoft clustering service was attempting key replication whilst you were upgrading, then you will still see errors appearing in the log until it gives up or completes. However, so long as there are no wow6432node paths set and the actual registry keys do exist then the issue will not reoccur.

    Regards,
    Robin Anderson
    Development
    Red-Gate Software
Sign In or Register to comment.