Transaction log backup with COPYTO issues

chtay1chtay1 Posts: 6
edited September 5, 2011 11:15PM in SQL Backup Previous Versions
I have recently set up transaction log backups for 12 databases to be copied across a network share using the COPYTO parameter.

I have a few issues with the process that are causing me headaches:
1. At least once a day so far files have copied but not renamed from .partial to .sqb (they weren't corrupt, although I have had failures that were corrupt also)
2. On a few occassions files have got stuck in Pending on the copy server, updating to 6.5.1.9 and changing some copy params may have helped here still waiting to see
3. I have had some file copy errors (not as frequent but under higher load ie. file sizes such as maintenance index rebuilds) where the copies fail with "The semaphore timeout period has expired".

Has anyone else experienced these types of issues and have any ideas on how to make the process more robust? Currently I'm spending at least 30-60mins a day (incl. weekends) making sure it all stays on track. I'm pretty sure I could write some tools to monitor/correct issues but if something else can be done that'd be the preference.

COPYTO params:
- COPYTO:CopyIntervalInMinutes = 1
- COPYTO:ExpiryIntervalInMinutes = 2880
- CYTO:MaxCopyIntervalCount = 10
- COPYTO:ThreadCount = 10

Chris

Comments

  • peteypetey Posts: 2,358 New member
    If your t-log backups all end at about the same, I would suggest setting the THREADCOUNT value to 1 first. If the error(s) still occur, that would mean that the issue lies with SQL Backup.
    Peter Yeoh
    SQL Backup Consultant Developer
    Associate, Yohz Software
    Beyond compression - SQL Backup goodies under the hood, updated for version 8
  • The lower the thread count the worse it seemed to perfom. At low thread counts it seemed to get a lot of the logs stuck in pending that would never start. I bumped up the thread count to ensure that they all at least tried to start before the next log backup otherwise the early log backups seemed to never get a chance. Our log files can vary a lot in size and as such sometimes it doesn't catch up copying until a couple of 15 minute backup cycles.

    To get over this for the mean time I've set the copy to go to a local folder to consolidate the logs from the backup into one location which I then move out to the remote server through a robocopy move operation.
  • Another issue which might mean I actually leave it this way is that if it has long network copy operations it's very difficult to stop the service if you need to. I actually needed to stop the service and I tried and eventually it failed, even with a long time out period and it actually caused the database cluster to fail over at an inconvenient time. I've since set this service to not cause the cluster to fail over and no restart but had to learn it the hard way (this is not ideal either). With the robocopy method the copying doesn't effect the backup service and so this feels a lot safer to me as I'm not sure what would happen if a fault caused the cluster to failover and the sqlbackup service had to wait minutes to shutdown.
  • peteypetey Posts: 2,358 New member
    Briefly, this is how SQL Backup handles the copy queue:

    - get list of files to copy. Files are sorted by no. of attempts (in descending order), followed by entry creation date. This means that SQL Backup will give priority to older files. The no. of files it keeps in the list is the no. of threads x thread multiplier. By default, this means 10 files (5 x 2).

    - it will then work down the list, assigning each file to the first available copy thread. When all copy threads have been used, or if the list of files have all been assigned, it enters a wait state.

    - the wait state is the copy interval (default is 1 minute), but SQL Backup checks every 5 seconds if any of the copy threads has completed its task. If there are still files in the list, SQL Backup will assign a new file to the thread.

    - at the end of the copy interval, SQL Backup returns to the first step above.

    In your case, is it possible that the network interface cannot handle the load, especially if you have 10 threads copying files to remote shares? Or pehaps its the remote share that cannot handle the load, since it sounds like you're copying all the files to the same remote share. In your robocopy job, how many threads are you using to copy the files?
    Peter Yeoh
    SQL Backup Consultant Developer
    Associate, Yohz Software
    Beyond compression - SQL Backup goodies under the hood, updated for version 8
  • Update on my approach.

    The log copy queue is still getting stuck even with copying to a local disk. Multiple files get stuck in "Copying" and I have a couple that are stuck on "Pending" newer files are going through.

    I'm going to try switching to USESIMPLECOPY on the backup command that recommended for local copies anyway?
  • peteypetey Posts: 2,358 New member
    Could you please download and install a debug version of SQL Backup that logs more details re. the copy process? You can download the copy from here:

    ftp://support.red-gate.com/Patches/sql_ ... _5_2_7.zip

    That archive contains the SQL Backup Agent executable file (SQBCoreService.exe). To replace the existing service executable file, do the following:

    - ensure that no SQL Backup processes are running
    - stop the SQL Backup Agent service, or disable the cluster resource if on a cluster
    - rename the existing executable file (SQBCoreService.exe) in the SQL Backup installation folder
    - extract and place the patched executable file into the same folder
    - Open the registry, and look for the SQL Backup Agent service definition. For the default instance, this would be

    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\SQLBackupAgent

    For a named instance, this would be

    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\SQLBackupAgent_<instance name>

    - Look for the ImagePath value, and edit its data. Add -sqbdebug to the end of the startup parameter e.g.

    C:\Program Files\Red Gate\SQL Backup\(LOCAL)\SQBCoreService.exe -sqbdebug

    - restart the SQL Backup Agent service/cluster resource

    In the folder where the SQL Backup Agent service file is located, there should now be 2 log files generated, SQBCoreServiceCopyManager_log.txt and SQBCoreService_log.txt. The first file will log all details re. the copy processes, which should shed some light on what's happening. Please send me both the log files after letting the SQL Backup Agent service run a while, until it starts having problems copying the files.

    Thank you.
    Peter Yeoh
    SQL Backup Consultant Developer
    Associate, Yohz Software
    Beyond compression - SQL Backup goodies under the hood, updated for version 8
  • Peter, thanks for the advice and however I can't afford for it to break again. This is a critical DR log shipping process from production systems. If I have further problems after switching to USESIMPLECOPY I'll give this a go but if USESIUMPLECOPY works okay I'll leave it alone to settle down for a while.

    Having said that does this version have any changes to the log copying process or does it just contain extra debugging code?
  • peteypetey Posts: 2,358 New member
    You should be aware that USESIMPLECOPY is blocking. If the time taken to back up the transaction log and copy the file to the destination folder is longer than your transaction log backup interval, you will not be meeting your disaster recovery goals.

    This version only adds extra debugging code, as I don't know the cause of your problems yet.
    Peter Yeoh
    SQL Backup Consultant Developer
    Associate, Yohz Software
    Beyond compression - SQL Backup goodies under the hood, updated for version 8
Sign In or Register to comment.