SQLBackup copyto out of order
BruceB
Posts: 14
Hi, I'm running 6.5.1.9, log shipping about 20 databases and have noticed when things get busy the copyto queue is doing something strange.
This morning, I've found a couple of cases where a backup from around 00:15 has not been copied but 300+ files have been copied since (now 10am).
There are still a lot of things in the queue as it is seriously backed up because of big reindex jobs and slow network links. But, up to 5am everything else is either marked successful or is still active.
Looking in backupfiles_copylist, the uncopied files still have status of P, and show 0 for count and retry_count.
Any thoughts as to why these copies are so out of sync will be greatly appreciated.
Thanks. BB.
This morning, I've found a couple of cases where a backup from around 00:15 has not been copied but 300+ files have been copied since (now 10am).
There are still a lot of things in the queue as it is seriously backed up because of big reindex jobs and slow network links. But, up to 5am everything else is either marked successful or is still active.
Looking in backupfiles_copylist, the uncopied files still have status of P, and show 0 for count and retry_count.
Any thoughts as to why these copies are so out of sync will be greatly appreciated.
Thanks. BB.
Comments
SQL Backup uses a similar query to retrieve the list of files to copy on each run. Is there anything odd about the count, created and lastattempt values?
SQL Backup Consultant Developer
Associate, Yohz Software
Beyond compression - SQL Backup goodies under the hood, updated for version 8
That was the query I was using to examine the log.
Running it again now the particular row now looks like ....
id name copyto overwrite count diskretryinterval diskretrycount mailto mailto_onerror created lastattempt
40266 H:\MSSQL10_50.MSSQLSERVER\MSSQL\Backup\fred\fred_20131215000201.sqb \\fred-sql2\Log Shipping Share\fred\LOG_fred_20131215000201.sqb 0 1 30 10 2013-12-15 00:02:05.377 2013-12-15 17:03:56.000
If that formats to anything useful for you, it should show you that it was only attempted once. The created time of 2 minutes after midnight makes sense and the last (and only) attempt was 5:03pm. When I was looking at it prior to 5pm it showed count of 0 and nothing in the last attempt column.
I've spared you the other 1000 rows from that period and changed the server name and database name. There were 4 files that behaved like this yesterday. All came right by themselves in the end.
Thanks, BB.
There are 2 things you could try. The default settings for the copying process is to pick up the oldest 10 files every 60 seconds to copy, handled by 5 process threads.
If there is a backlog in the files, you could reduce this interval to 30 seconds, to double the number of files processed each minute. You can do this by creating a registry value named COPYTO:SleepIntervalInSeconds, a DWORD type, and give it a data value of 30. This only makes sense if SQL Backup currently takes less than 60 seconds to copy the batch of 10 files.
You could also increase the number of threads allocated to copy files. By default, 5 threads are used. You can increase this number up to 10, by creating a registry value named COPYTO:ThreadCount, a DWORD type, and give it a value between 1 and 10. Again, this only makes sense if SQL Backup currently takes less than 60 seconds to copy the batch of 10 files, and increasing the number of process threads will not overload the CPU and network resources.
Both registry values are created in the SQL Backup registry node for the SQL Server instance you want to apply the settings to i.e. HKLM\Software\Red Gate\SQL Backup\BackupSettingsGlobal\<INSTANCE NAME>.
Once you have created the values, run the following from Management Studio on the affected SQL Server instance:
and SQL Backup will use the new values, without the need to restart the SQL Backup Agent service.
SQL Backup Consultant Developer
Associate, Yohz Software
Beyond compression - SQL Backup goodies under the hood, updated for version 8
Some of the log files here are pretty big and take 10s of minutes to copy, others only take a few seconds. With the log backups every 15 minutes, it appears that the small files are getting copied down during this period and just the big ones are queuing. (Though I can picture a scenario where all 5 threads are doing big file copies which would certainly hold other things up).
The network link is very definitely a bottleneck so adding more threads is probably not going to help much.
As this doesn't look likely to solve the issue with the out of order copying I don't think I'll make these changes.
Thanks for your time and thought on the issue.
BB