log backup failing randomly (dreaded vdi 1010)

physixphysix Posts: 8
edited September 18, 2007 9:22AM in SQL Backup Previous Versions
We have an issue with a particular SQL server and I hope that you can shed some light on it. During the day (busy times) the log backup process fails with a server connection error. (the dreaded vdi1010) We have tried searching the forums but have come to no solution. Petey has suggested that it is a default timeout problem with SQL, but a native log backup can still run. This is a mission critical server and we cannot run throughout the day without a backup. (not to mention our .ldf would grow very large) Is there a workaround to increase the default timeout or increase the priority or something?

This fails with both TSQL and command line.

Example:

SQLBackupC -SQL "BACKUP LOG Microtraffic TO DISK = '\\mitsql2\e$\BACKUP\MITSQL1\SQL\MicroTraffic\<AUTO>' WITH NAME = '<AUTO>', DESCRIPTION = '<AUTO>', ERASEFILES = 15, MAILTO_ONERROR = '[email protected]', COMPRESSION = 1"

outputs:

SQL Backup log file
5/31/2007 10:11:32 AM: Red Gate SQL Backup 5.1.0.2781, (c) Red Gate Software Ltd 2004 - 2007
5/31/2007 10:11:32 AM:
5/31/2007 10:11:32 AM: Backing up Microtraffic (transaction log) to:
5/31/2007 10:11:32 AM: \\mitsql2\e$\BACKUP\MITSQL1\SQL\MicroTraffic\LOG_(local)_Microtraffic_20070531_101132.sqb

5/31/2007 10:11:32 AM: BACKUP LOG [Microtraffic] TO DISK = '\\mitsql2\e$\BACKUP\MITSQL1\SQL\MicroTraffic\<AUTO>' WITH NAME = '<AUTO>', DESCRIPTION = '<AUTO>', ERASEFILES = 15, MAILTO_ONERROR = '[email protected]', COMPRESSION = 1

5/31/2007 10:12:03 AM: VDI error 1010: Failed to get configuration from server. Check that the SQL Server instance is running, and that you have the SQL Server Systems Administrator server role. Error code: (-2139684861: The api was waiting and the timeout interval had elapsed.)
5/31/2007 10:14:33 AM: SQL error 3013: BACKUP LOG is terminating abnormally.
5/31/2007 10:14:33 AM: SQL error 3201: Cannot open backup device 'SQLBACKUP_AB53A984-5F8E-4260-9BD8-8F30AE55A204'. Operating system error 0x80070002(The system cannot find the file specified.).
5/31/2007 10:14:33 AM:
5/31/2007 10:14:33 AM: Memory profile
5/31/2007 10:14:33 AM: Type Maximum Minimum Average Blk count Total
5/31/2007 10:14:33 AM:





5/31/2007 10:14:33 AM: Commit 14576254976 4096 5115912 2956 15122636800
5/31/2007 10:14:33 AM: Reserve 2583625728 4096 6824255 397 2709229568
5/31/2007 10:14:33 AM: Free 6863322546176 4096 21728369035 404 8778261090304
5/31/2007 10:14:33 AM: Private 14576254976 4096 6781575 2608 17686347776
5/31/2007 10:14:33 AM: Mapped 1585152 4096 141485 59 8347648
5/31/2007 10:14:33 AM: Image 32063488 4096 199957 686 137170944
5/31/2007 10:14:33 AM:
5/31/2007 10:14:33 AM:
5/31/2007 10:14:33 AM: Warning 300: Backup failed. Retry attempt: 1
5/31/2007 10:14:35 AM: BACKUP LOG [Microtraffic] TO DISK = '\\mitsql2\e$\BACKUP\MITSQL1\SQL\MicroTraffic\LOG_(local)_Microtraffic_20070531_101132.sqb' WITH NAME = 'Database (Microtraffic), 5/31/2007 10:11:32 AM', DESCRIPTION = 'Backup on 5/31/2007 10:11:32 AM Server: MITSQL1 Database: Microtraffic', INIT, ERASEFILES = 15, MAILTO_ONERROR = '[email protected]', COMPRESSION = 1

5/31/2007 10:14:35 AM:
5/31/2007 10:14:35 AM: Warning 300: Backup failed. Retry attempt: 2
5/31/2007 10:14:37 AM: BACKUP LOG [Microtraffic] TO DISK = '\\mitsql2\e$\BACKUP\MITSQL1\SQL\MicroTraffic\LOG_(local)_Microtraffic_20070531_101132.sqb' WITH NAME = 'Database (Microtraffic), 5/31/2007 10:11:32 AM', DESCRIPTION = 'Backup on 5/31/2007 10:11:32 AM Server: MITSQL1 Database: Microtraffic', INIT, ERASEFILES = 15, MAILTO_ONERROR = '[email protected]', COMPRESSION = 1

5/31/2007 10:14:37 AM:
5/31/2007 10:14:37 AM: Warning 300: Backup failed. Retry attempt: 3
5/31/2007 10:14:39 AM: BACKUP LOG [Microtraffic] TO DISK = '\\mitsql2\e$\BACKUP\MITSQL1\SQL\MicroTraffic\LOG_(local)_Microtraffic_20070531_101132.sqb' WITH NAME = 'Database (Microtraffic), 5/31/2007 10:11:32 AM', DESCRIPTION = 'Backup on 5/31/2007 10:11:32 AM Server: MITSQL1 Database: Microtraffic', INIT, ERASEFILES = 15, MAILTO_ONERROR = '[email protected]', COMPRESSION = 1

5/31/2007 10:14:39 AM:
5/31/2007 10:14:39 AM: Warning 300: Backup failed. Retry attempt: 4
5/31/2007 10:14:41 AM: BACKUP LOG [Microtraffic] TO DISK = '\\mitsql2\e$\BACKUP\MITSQL1\SQL\MicroTraffic\LOG_(local)_Microtraffic_20070531_101132.sqb' WITH NAME = 'Database (Microtraffic), 5/31/2007 10:11:32 AM', DESCRIPTION = 'Backup on 5/31/2007 10:11:32 AM Server: MITSQL1 Database: Microtraffic', INIT, ERASEFILES = 15, MAILTO_ONERROR = '[email protected]', COMPRESSION = 1

Comments

  • Brian DonahueBrian Donahue Posts: 6,590 New member
    Hi,

    Thanks a lot for your post. Your log looks frighteningly similar to this error report:

    http://www.red-gate.com/messageboard/vi ... php?t=4289

    Petey says that the first error message indicates that SQL Server's Virtual Device Interface didn't answer in time, in SQB 5 it looks like it was increased from 15 to 30 seconds, according to your log. Now, native backups wouldn't need to load VDI in order to do the work, so native backup would more than likely be successful.

    Eventually, it appears that the VDI hook does work, but the backup is failing for other reasons. Have you tried specifying a MAXDATABLOCK and or a MAXTRANSFERSIZE setting for the backup? These are available in the backup wizard now, to make it a bit easier for you.

    Hope this helps!
  • peteypetey Posts: 2,358 New member
    The duration between the VDI failure (5/31/2007 10:12:03) and the SQL Server error (5/31/2007 10:14:33 AM) is 2.5 minutes. Would it have taken that long to log on the server at that time? Can you confirm that there were no other backups being performed on the Microtraffic database during that time? Thanks.
    Peter Yeoh
    SQL Backup Consultant Developer
    Associate, Yohz Software
    Beyond compression - SQL Backup goodies under the hood, updated for version 8
  • Bump...

    Sorry for the delay in my response. This is still happening and is becoming more of an issue each day. Our server is experiencing heavy slowdowns and blocking at random times and coincidentally the log backup job is the only agent process running. I say coincidentally because I find it hard to believe that a blocked process can cause any problems (because the spid is sleeping) but my dba team is throwing "redgate problems" to management. This last slowdown showed the SQBbackup spid being blocked by the DB checkpoint spid for over 15 minutes. The server tanked throughout this period.

    Like charleyevans I am running sql 2005 EE x64 on an x64 quad xeon monster server. At present we have 16gb ram and looking at buffer cache hit ratio and disk I/O we have very little evidence of needing memory.

    petey: as for your question, No I cannot definitively say that there were no other backups during that posted log. (I was testing) But on a normal day I receive 10-13 failed log backup emails where there is nothing running concurrently. Rather than post those here can I email you the output? FYI 2.5 minutes is a short timeout before failing. (on our server) I’m surprised you're surprised. most failures take longer than that.

    brian: correct me if I'm wrong, but MAXDATABLOCK and or MAXTRANSFERSIZE will not apply if the database cannot even be reached. In any case, yes I have tried these settings to no avail.

    currently:

    exec master..sqlbackup N'-SQL "BACKUP LOG [MicroTraffic] TO DISK = ''\\mitsql2\e$\BACKUP\MITSQL1\SQL\MicroTraffic\<AUTO>'' WITH NAME = ''<AUTO>'', DESCRIPTION = ''<AUTO>'', ERASEFILES = 15, MAILTO_ONERROR = ''[email protected]'', MAXDATABLOCK = 1048576, COMPRESSION = 1"'

    fails 50% of the time during the day. The backup-log job backs up 4 other databases without problems. However "microtraffic" is the pig.

    Thanks again!
    -m.bee
  • peteypetey Posts: 2,358 New member
    VDI error 1010 is raised when SQL Backup did not receive an acknowledgement from SQL Server that its backup request had been received and is now executing. Possible reasons include:

    - another backup of the same database is running, hence blocking the current backup
    - the backup request never reached SQL Server within 30 seconds, in which case SQL Backup times out. Possible reasons include failure to log in within that time, network issues etc.

    In your example, it appears that the backup request is blocked by a database checkpoint process. 15 minutes is surely too long, unless your 'recovery interval' option is indeed that long, or longer (unlikely).

    Are there any errors recorded in the SQL Server logs, during those times when the backups failed? Would it be possible to revert to native SQL Server backups for the time being, to see if the same errors occur?

    Yes, please send the logs to my email address as stated in my profile.

    Thanks.
    Peter Yeoh
    SQL Backup Consultant Developer
    Associate, Yohz Software
    Beyond compression - SQL Backup goodies under the hood, updated for version 8
  • email sent...

    the backup log statement is indeed being blocked by the checkpoint spid. when this occurs the backup fails. When it executes and beats the checkpoint spid to runnable then the backup succeeds.

    In our case the checkpoint spid is 10 and it cycles through the databases that it is checkpointing. It goes between "SUSPENDED" and "RUNNABLE" every few seconds. (70% on suspended) It appears that when the status is "SUSPENDED" and the DBName column is "mydatabase" then the Log backup process will fail for "mydatabase".

    This is all being checked via: sp_who2 "active"

    I have changed the recovery interval to '2' (2 minute checkpoints) but it still cycles between these status modes every few seconds. (the backup still randomly fails) My guess is the checkpoint is waiting for I/O to complete and reports SUSPENDED... then when I/O completes it changes to RUNNABLE, but may not necessarily checkpoint because of the checkpoint interval. If i backup during this RUNNABLE state then it succeeds.

    Full backups always succeed.

    This occurs when the DB is in both FULL and BULK models.
  • an interesting thing to note is that when the backup process is succeeding then it blocks the sys checkpoint spid (10).

    Does the backup process issue its own checkpoint that is butting heads wtih the sys checkpoint?
  • Brian DonahueBrian Donahue Posts: 6,590 New member
    Maybe two different backup solutions are trying to back up at roughly the same time? Also, and I say this out of pure ignorance, could the log file be growing at this time? I'm not sure if this would have an effect on the checkpoint; would it?
  • There are definately no other backups running. I can copy the TSQL code from my jobstep and execute manually, again and again. (70% of which will fail during the middle of the day.) At night this process will always succeed.

    As for growth: The log file is not growing, however log size does affect checkpointing. I believe that if a log file is too big (ex:20Gb) with only a small about of used space (300mb) then a checkpoint will take longer to complete as it has to scan a larger physical container to write the small amout of dirty pages. (little bits of log spread over a large about of harddrive) our log is well trimmed and sized accordingly to our use. (5Gb container with use between 01-75%) Perhaps this is too much for a busy server to checkpoint efficiently?

    Still with a recovery interval of 2, the actual checkpoint process shouldn't be interfering... It sounds like the idle checkpoint spid is blocking the backup spid regardless of if the DB is checkpointing or not.

    The spid is blocked and redgate is forced to wait, then when it finally does come back the virtual device that redgate created is no longer there. We then throw an operating system error:

    6/27/2007 2:35:33 PM: SQL error 3201: Cannot open backup device 'SQLBACKUP_0D9DFC95-15CC-42A4-8178-C5D7BE5850FE'. Operating system error 0x80070002(The system cannot find the file specified.).
  • Brian DonahueBrian Donahue Posts: 6,590 New member
    I have replied to you privately about this.
  • Im also now getting this error

    I can backup the log normally using the native SQL command

    but not using SQLBackup.

    has anyone got a solution for this?
  • some more info

    19/07/2007 18:19:09: VDI error 1010: Failed to get configuration from server. Check that the SQL Server instance is running, and that you have the SQL Server Systems Administrator server role. Error code: (-2139684860: An abort request is preventing anything except termination actions.)
    19/07/2007 18:19:09: SQL error 3013: BACKUP LOG is terminating abnormally.
    19/07/2007 18:19:09:
    19/07/2007 18:19:09: Memory profile
    19/07/2007 18:19:09: Type Maximum Minimum Average Blk count Total
    19/07/2007 18:19:09:





    19/07/2007 18:19:09: Commit 208003072 4096 232178 12256 2845577216
    19/07/2007 18:19:09: Reserve 14622720 4096 23706 10558 250298368
    19/07/2007 18:19:09: Free 4063232 4096 251070 499 125284352
    19/07/2007 18:19:09: Private 208003072 4096 136949 22205 3040956416
    19/07/2007 18:19:09: Mapped 1060864 4096 105505 62 6541312
    19/07/2007 18:19:09: Image 6156288 4096 88442 547 48377856




    SQL Backup job failed with exitcode: 1010 SQL error code: 3013
  • Could you please let me know the outcome of this issue? We are encountering the same problem.

    Thanks, Phil.
  • This is fixed in the patch release of the server components.

    see http://www.red-gate.com/messageboard/vi ... php?t=5484
    Helen Joyce
    SQL Backup Project Manager
    Red Gate Software
  • This new version does indeed correct the retry logic that never seemed to work before. I have been running it for several weeks. It does however bring up a new issue:

    Previously if backup attempt 1 failed all 3 retries would immediately fail. No file would be written and we would cross our fingers that the next scheduled log backup would succeed.

    Now, If the first backup attempt fails (forcing attempt 2-4 to be made) IF 2-4 succeed then no LSN information will be written to the created file. This breaks any log shipping or wildcard restores that need to occur.

    This is a consistent problem and can be replicated across multiple servers. Anytime 2-4 succeed there will be no LSN information. IMO this is almost worse than the original issue. Please advise.

    -michael
  • Brian DonahueBrian Donahue Posts: 6,590 New member
    Is the version of this SQL Server 2000 and the patch level SP3? I ran across this KB, which mentions a User-Mode Scheduler issue that's fixed in SP4. The root of this issue is (of all things) the I/O being too fast. From what I understand this causes a situation where checkpointing is suspending threads. And I also did look into the whether or not VDI backups like SQL Backup run a checkpoint -- apparently they do this automatically.

    Can you have a look into this and see if it describes the problem you're having?
    http://support.microsoft.com/kb/815056/ ... 2&sid=1073

    I'm thinking that the delay Peter points out could be down to a problem with the scheduler.
  • SQL 2005 Enterprise x64 sp2 build 9.00.3152.00

    running on Windows Server 2003 x64 Enterprise

    I set my recovery interval (checkpoint) to "2" minutes to decrease the checkpoint duration but it didn't help.

    Were you suggesting this as a fix for the lack of LSN? are you not able to replicate this in testing?
  • I'm running into the same problem on only one of my servers, which is running SQL 2000 SP4 on Windows 2000 SP4. There aren't any other backup jobs occurring during this time. All SQL Backup type jobs fail, both Transaction and Full. Restarting the SQL Server is the only resolution. This first occurred two weeks after I installed Red Gate on the server for the first time. It then happened two months later. Now it's happening again two weeks later. The difference between the previous level and the latest is the file produced, which doesn't have any LSN information stored in the header to read from, thereby causing restore issues. Native SQL Backup for one server out of many is not an option, nor is restarting the SQL Server on a weekly basis. Any help would be welcomed.

    Thanks,

    Laurence

    The error message before the latest patch level was:

    9/17/2007 12:00:00 PM: Backing up model (transaction log) to:
    9/17/2007 12:00:00 PM: F:\backup\model\LOG_TXSQL008_model_20070917_120000.sqb

    9/17/2007 12:00:00 PM: BACKUP LOG [model] TO DISK = 'F:\backup\<database>\<AUTO>.sqb' WITH NAME = '<AUTO>', DESCRIPTION = '<AUTO>', INIT, VERIFY, ERASEFILES = 23h, MAILTO_ONERROR = '[email protected];[email protected]ttiinc.com;[email protected]', COMPRESSION = 1, THREADCOUNT = 3

    9/17/2007 12:00:00 PM: VDI error 1010: Failed to get configuration from server. Check that the SQL Server instance is running, and that you have the SQL Server Systems Administrator server role. Error code: (-2139684860: An abort request is preventing anything except termination actions.)
    9/17/2007 12:00:01 PM: SQL error 3013: BACKUP LOG is terminating abnormally.
    9/17/2007 12:00:01 PM:
    9/17/2007 12:00:01 PM: Memory profile
    9/17/2007 12:00:01 PM: Type Maximum Minimum Average Blk count Total
    9/17/2007 12:00:01 PM:





    9/17/2007 12:00:01 PM: Commit 769785856 4096 1578030 1140 1798955008
    9/17/2007 12:00:01 PM: Reserve 4186112 4096 483656 162 78352384
    9/17/2007 12:00:01 PM: Free 16515072 4096 1159273 233 270110720
    9/17/2007 12:00:01 PM: Private 769785856 4096 2358098 780 1839316992
    9/17/2007 12:00:01 PM: Mapped 1019904 4096 117715 46 5414912
    9/17/2007 12:00:01 PM: Image 7581696 4096 68435 476 32575488
    9/17/2007 12:00:01 PM:
    9/17/2007 12:00:01 PM: Warning 300: Backup failed. Retry attempt: 1
    9/17/2007 12:00:03 PM: BACKUP LOG [model] TO DISK = 'F:\backup\model\LOG_TXSQL008_model_20070917_120000.sqb' WITH NAME = 'Database (model), 9/17/2007 12:00:00 PM', DESCRIPTION = 'Backup on 9/17/2007 12:00:00 PM Server: TXSQL008 Database: model', INIT, VERIFY, ERASEFILES = 23h, MAILTO_ONERROR = '[email protected];[email protected];[email protected]', COMPRESSION = 1, THREADCOUNT = 3

    9/17/2007 12:00:03 PM:
    9/17/2007 12:00:03 PM: Warning 300: Backup failed. Retry attempt: 2
    9/17/2007 12:00:05 PM: BACKUP LOG [model] TO DISK = 'F:\backup\model\LOG_TXSQL008_model_20070917_120000.sqb' WITH NAME = 'Database (model), 9/17/2007 12:00:00 PM', DESCRIPTION = 'Backup on 9/17/2007 12:00:00 PM Server: TXSQL008 Database: model', INIT, VERIFY, ERASEFILES = 23h, MAILTO_ONERROR = '[email protected];[email protected];[email protected]', COMPRESSION = 1, THREADCOUNT = 3

    9/17/2007 12:00:05 PM:
    9/17/2007 12:00:05 PM: Warning 300: Backup failed. Retry attempt: 3
    9/17/2007 12:00:07 PM: BACKUP LOG [model] TO DISK = 'F:\backup\model\LOG_TXSQL008_model_20070917_120000.sqb' WITH NAME = 'Database (model), 9/17/2007 12:00:00 PM', DESCRIPTION = 'Backup on 9/17/2007 12:00:00 PM Server: TXSQL008 Database: model', INIT, VERIFY, ERASEFILES = 23h, MAILTO_ONERROR = '[email protected];[email protected];[email protected]', COMPRESSION = 1, THREADCOUNT = 3

    9/17/2007 12:00:07 PM:
    9/17/2007 12:00:07 PM: Warning 300: Backup failed. Retry attempt: 4
    9/17/2007 12:00:09 PM: BACKUP LOG [model] TO DISK = 'F:\backup\model\LOG_TXSQL008_model_20070917_120000.sqb' WITH NAME = 'Database (model), 9/17/2007 12:00:00 PM', DESCRIPTION = 'Backup on 9/17/2007 12:00:00 PM Server: TXSQL008 Database: model', INIT, VERIFY, ERASEFILES = 23h, MAILTO_ONERROR = '[email protected];[email protected];[email protected]', COMPRESSION = 1, THREADCOUNT = 3

    =======================================
    The error message after the latest patch:

    9/17/2007 1:16:13 PM: Backing up model (transaction log) to:
    9/17/2007 1:16:13 PM: F:\backup\model\LOG_TXSQL008_model_20070917_131613.sqb

    9/17/2007 1:16:13 PM: BACKUP LOG [model] TO DISK = 'F:\backup\<database>\<AUTO>.sqb' WITH NAME = '<AUTO>', DESCRIPTION = '<AUTO>', INIT, VERIFY, ERASEFILES = 23h, MAILTO_ONERROR = '[email protected];[email protected];[email protected]', COMPRESSION = 1, THREADCOUNT = 3

    9/17/2007 1:16:13 PM: VDI error 1010: Failed to get configuration from server. Check that the SQL Server instance is running, and that you have the SQL Server Systems Administrator server role. Error code: (-2139684860: An abort request is preventing anything except termination actions.)
    9/17/2007 1:16:13 PM: SQL error 3013: BACKUP LOG is terminating abnormally.
    9/17/2007 1:16:13 PM:
    9/17/2007 1:16:14 PM: Memory profile
    9/17/2007 1:16:14 PM: Type Maximum Minimum Average Blk count Total
    9/17/2007 1:16:14 PM:





    9/17/2007 1:16:14 PM: Commit 769785856 4096 1568325 1147 1798868992
    9/17/2007 1:16:14 PM: Reserve 4186112 4096 516439 167 86245376
    9/17/2007 1:16:14 PM: Free 16515072 4096 1130619 232 262303744
    9/17/2007 1:16:14 PM: Private 769785856 4096 2332222 792 1847119872
    9/17/2007 1:16:14 PM: Mapped 1019904 4096 117715 46 5414912
    9/17/2007 1:16:14 PM: Image 7581696 4096 68444 476 32579584
    9/17/2007 1:16:14 PM:
    9/17/2007 1:16:14 PM: Warning 300: Backup failed. Retry attempt: 1
    9/17/2007 1:16:16 PM: BACKUP LOG [model] TO DISK = 'F:\backup\model\LOG_TXSQL008_model_20070917_131613.sqb' WITH NAME = 'Database (model), 9/17/2007 1:16:13 PM', DESCRIPTION = 'Backup on 9/17/2007 1:16:13 PM Server: TXSQL008 Database: model', INIT, VERIFY, ERASEFILES = 23h, MAILTO_ONERROR = '[email protected];[email protected];[email protected]', COMPRESSION = 1, THREADCOUNT = 3

    9/17/2007 1:16:17 PM: Backup data size : 1.500 MB
    9/17/2007 1:16:17 PM: Compressed data size: 7.500 KB
    9/17/2007 1:16:17 PM: Compression rate : 99.51%

    BACKUP LOG successfully processed 0 pages in 0.020 seconds (0.000 MB/sec).
    9/17/2007 1:16:17 PM:
    9/17/2007 1:16:17 PM: Validating files:
    9/17/2007 1:16:17 PM: F:\backup\model\LOG_TXSQL008_model_20070917_131613.sqb
    9/17/2007 1:16:17 PM:
    9/17/2007 1:16:17 PM: RESTORE VERIFYONLY FROM DISK = 'F:\backup\model\LOG_TXSQL008_model_20070917_131613.sqb'

    The backup set is valid.
  • Brian DonahueBrian Donahue Posts: 6,590 New member
    Physix,

    It's the 2.5 minute delay between setting up a Virtual device and a failure that I find intriguing. SQL Backup is able to log in, create a VDI virtual device, but it waits for data to come from SQL Server and it never comes. My thought was that the thread that SQL Backup was running was getting preempted by SQL Server's User Mode Scheduler and whatever is preempting it is not returning the thread.
Sign In or Register to comment.