The Log Copy Queue
meastland
Posts: 47
To whom it may concern:
We are currently running SQL Backup 7.6.0.29. We use the log copy queue to support log shipping for a system that experiences high month-end transaction volume. As a result, the log copy queue has trouble keeping up with copying the larger log backup files.
What I'm seeing when querying the backupfiles_copylist via the sqbdata utility is that many of the queue entries are getting flagged with a status of E. Based on a forum search, it sounds like a status of E indicates the queue entry has expired, presumably because it was not copied within the 24 default period. Given that, I have the following questions:
1) It appears that the files are not being copied in the order they were created. By that, I mean there are files with a later [created] date which are getting copied before files with an earlier date. Is there anything I could be doing wrong in my config to cause this? It seems counter-intuitive to me.
2) Are there best practices on config settings for the registry entries that control copy behavior, specifically: COPYTO:ExpiryIntervalInMinutes and
COPYTO:ThreadCount ? In addition, is there a max value allowed for COPYTO:ExpiryIntervalInMinutes?
3) What are the valid values for the [status] field in backupfile_copylist? Here are the ones I think I know: A = Active, E = Expired, P = Pending, S = Success.
4) Finally, how is the data in these admin tables maintained? Do we need to manually schedule a job to execute cleanup commands against them? If so, can you provide a full list of the tables that should be included?
Thanks in advance for any information or suggestions you can provide.
Regards,
-Mike Eastland
We are currently running SQL Backup 7.6.0.29. We use the log copy queue to support log shipping for a system that experiences high month-end transaction volume. As a result, the log copy queue has trouble keeping up with copying the larger log backup files.
What I'm seeing when querying the backupfiles_copylist via the sqbdata utility is that many of the queue entries are getting flagged with a status of E. Based on a forum search, it sounds like a status of E indicates the queue entry has expired, presumably because it was not copied within the 24 default period. Given that, I have the following questions:
1) It appears that the files are not being copied in the order they were created. By that, I mean there are files with a later [created] date which are getting copied before files with an earlier date. Is there anything I could be doing wrong in my config to cause this? It seems counter-intuitive to me.
2) Are there best practices on config settings for the registry entries that control copy behavior, specifically: COPYTO:ExpiryIntervalInMinutes and
COPYTO:ThreadCount ? In addition, is there a max value allowed for COPYTO:ExpiryIntervalInMinutes?
3) What are the valid values for the [status] field in backupfile_copylist? Here are the ones I think I know: A = Active, E = Expired, P = Pending, S = Success.
4) Finally, how is the data in these admin tables maintained? Do we need to manually schedule a job to execute cleanup commands against them? If so, can you provide a full list of the tables that should be included?
Thanks in advance for any information or suggestions you can provide.
Regards,
-Mike Eastland
Comments
When any of the 5 threads are available, it assigns a file to the thread to begin the copying process.
Over a duration of the 'COPYTO:SleepIntervalInSeconds' default value of 60 seconds, it checks every 'COPYTO:WaitIntervalInSeconds' value of 5 seconds if any of the 5 threads are available again, and assigns the next file to the free thread to be copied.
Once the 60 seconds has elapsed, it repeats the above cycle i.e. picks up the oldest 25 files that need to be copied, finds a free thread, assign the file to the thread etc.
So basically, the copying process is set up to copy a maximum of 25 files over a 60 seconds period. Assuming a transaction log backup interval of 5 minutes, and assuming your server can handle copying 25 files in a minute, SQL Backup should be able to copy a maximum of 1500 files over 5 minutes.
A potential bottleneck is if all 5 threads are occupied copying large files, and your transaction log backup interval is shorter than the time it takes to copy the file e.g. it takes 6 minutes to copy each file, your backup interval is only 5 minutes, and you have 5 or more databases generating that large of a transaction log backup file. The 5 worker threads copying the files will be preoccupied with the larger files, servicing the smaller files only later, than be preoccupied again with the larger files, and eventually everything gets back-logged.
Increasing the number of worker threads won't help, as network bandwidth is a fixed resource and the existing worker threads would already be using the maximum possible bandwidth. Increasing the 'COPYTO:ExpiryIntervalInMinutes' may help, as it would allow files to stay in the copy queue longer.
Another potential bottleneck is if you are generating more than 25 backup files a minute. As SQL Backup will only copy a maximum of 25 files a minute, you will eventually end up with a back log. Increasing the 'COPYTO:ThreadCount' and/or 'COPYTO:ThreadMultiplier' values will be required to handle this load.
To answer your questions:
>> 1) It appears that the files are not being copied in the order they were created.
Two possibilities - the older files have expired because they were not copied within the first 24 hours, and the later files were copied, or the larger files took longer to complete and smaller files (although newer) assigned to a different worker thread were copied over first.
In the first case, you'll need to increase the 'COPYTO:ExpiryIntervalInMinutes' value. Assuming the bottleneck occurs only during month end, the copy queue should clear up during the beginning of the next month.
In the second case, SQL Backup's transaction restore process will eventually restore the files once they are available. You might encounter a few errors when the newer files have been copied over before the older files, but once the older files have been copied over, the files will be restored in the correct order.
>> 2) Are there best practices on config settings for the registry entries that control copy behavior, specifically: COPYTO:ExpiryIntervalInMinutes and
COPYTO:ThreadCount ?
This really depends on your backup patterns. As described above, 'COPYTO:ExpiryIntervalInMinutes' helps in keeping the files longer in the copy queue, if your current network bandwidth has already maxed out. 'COPYTO:ThreadCount' would help if you have network bandwidth to spare.
>> In addition, is there a max value allowed for COPYTO:ExpiryIntervalInMinutes?
No, there is no maximum value.
>> 3) What are the valid values for the [status] field in backupfile_copylist? Here are the ones I think I know: A = Active, E = Expired, P = Pending, S = Success.
And C = Cancelled, used when a hosted storage upload copy item is cancelled.
4) Finally, how is the data in these admin tables maintained?
The retention policy that you set up for the local history applies to the 'backupfiles_copylist' and 'backupfiles_copylist_log' tables too. From the GUI, select the server instance, right click to bring up the context menu, and select the 'Server Options' item. The retention policy is the one under 'SQL Server backup and restore history'. If you need a different retention policy, you'll need to perform the deletion manually. It's only those 2 tables that are involved.
Lastly, there are some other timing settings that you can tweak, if SQL Backup requires more than a single attempt to copy your files. You can see if this is the case by checking the 'count' column in the 'backupfiles_copylist' table. If this is happening in your case, let mw know and I'll explain the settings.
SQL Backup Consultant Developer
Associate, Yohz Software
Beyond compression - SQL Backup goodies under the hood, updated for version 8