Restore doesnt finish.. it just keeps going on and on and...

abroadwayabroadway Posts: 12
edited March 14, 2008 10:10PM in SQL Backup Previous Versions
Hi,
We have been backing up with SQL Backup for months now.
Successfully restored to a spare box as part of our test Disaster Recovery during our trial.

However, we now have a real need to restore a production DB backup and SQL Backup is not working.
(The Restore job just keeps going on and on - just like a whingeing customer, it never ends!)

Here's what we did:
I copied the production backup .sqb file to the standby server, which also has a licensed copy of SQL Backup Lite.

Ran the SQL Backup Restore wizard and then copied the script to a Job in SQL EM.

Heres the script:
master..sqlbackup N'-SQL "RESTORE DATABASE [CATSites01FailOver] FROM DISK = ''D:\Backups\FULL_(local)_Production_20080309_180001.sqb'' WITH NORECOVERY, MOVE ''CMS_Data'' TO ''F:\Databases\ProductionFailOver_Data.MDF'', MOVE ''CMS_Log'' TO ''C:\DataLogs\ProductionFailOver_Log.LDF'', MOVE ''CATSites01_1_Log'' TO ''D:\DataLogs\ProductionFailOver_Log.LDF'', REPLACE"'

The DB size is 11GB compressed 64GB physical disk space for MDF, with two logs on different mirrored drives (about 11GB of LDF).

When the SQL Job runs, I can see all the file devices being created and in SQL EM, the Database (Loading) then the job just keeps running and running and running.
I had to kill the job two nights ago, BY stopping the Redgate Service. Then noticed a new version of SQL Backup and upgraded to 5.3.

I ran the job again with SQL Backup 5.3 (after deleting the DB from EM and creating a new Restore job using SQL Backup Restore Wizard).

The standby server works fine when I restore a MS Native Backup.
Its definitely a SQB related problem during the restore process.
BTW: backups using SQL Backup happen everyday, they're worthless if I cant restore!

This is a bit of a nightmare, because we're now reliant on the extra space savings SQL Backup 5 gave us, for other things.

PLEASE help me resolve this. Has anyone else had this problem?
Adam

Comments

  • peteypetey Posts: 2,358 New member
    Could you please download a diagnostic application from here? Select the 'Check file integrity' option, and run a check on the backup file ('D:\Backups\FULL_(local)_Production_20080309_180001). What is the reported status of the file?

    Thanks.
    Peter Yeoh
    SQL Backup Consultant Developer
    Associate, Yohz Software
    Beyond compression - SQL Backup goodies under the hood, updated for version 8
  • Hi Peter (and thanks for the quick reply),

    I downloaded and ran the SBaTU.exe against the FULL.sqb backup file.
    However!
    Not long after running it, the log window displayed a constant stream of:
    Found terminating marker for device 0, position 2,578,608,168
    Found terminating marker for device 0, position 2,578,608,177
    Found terminating marker for device 0, position 2,578,608,186
    (this just kept going and going).

    Also, CPU on the server was being consumed on both CPU's at around 80%.

    I had to kill the SBaTU job. There seemed no end in sight for the above incremented errors.

    Does this give you a hint about the problem?

    Do I need to let the job run? Although if its the same issue causeing the SQL Restore to fail, then it could go on for 24 hours or more. (I cant afford to have this server out of action).

    Hope this helps. I'm sweating on this tonight and it will be a nightmare to have to use Native MS Backup Restore (no space!)

    Thanks for your expertise in solving this with me.
    Adam
  • peteypetey Posts: 2,358 New member
    No, it should not keep reporting the 'Found terminating marker...' indefinitely. It looks like from position 2,578,608,168 onwards, the file just contains a stream of 0s'.

    Could you pls run the same check on the production backup file?

    Thanks.
    Peter Yeoh
    SQL Backup Consultant Developer
    Associate, Yohz Software
    Beyond compression - SQL Backup goodies under the hood, updated for version 8
  • Peter,
    That was the Production Full Backup file.
    It was copied to a secondary server for Restoring.

    I did run SBaTU.exe on another DB backup file, which was only 6GB compressed and that worked without error.

    Is the size a problem for SQL Backup?
    Our original (successful) SQL Backup / Restore test was when the production DB was around 40GB (7GB compressed).
    And now it is 64GB (11GB Compressed).

    Is SQL Backup Lite unable to cope with these larger files?
    There are never any errors during the backup phase and the hardware RAID array is working perfectly.
    ?
    Thanks again for your continued help! :)
  • peteypetey Posts: 2,358 New member
    Did you run the check on the original backup file, or the copy of it on the secondary server?

    SQL Backup Lite has no restrictions on the file size used for restoring.

    Thanks.
    Peter Yeoh
    SQL Backup Consultant Developer
    Associate, Yohz Software
    Beyond compression - SQL Backup goodies under the hood, updated for version 8
  • Ahh! I understand now.

    I will have to wait before running the check on the production box.
    As noted SBaTU.exe seemed very CPU intensive.

    During the restore, I would have thought SQB would detect errors and exit 'gracefully', rather than spin its wheels?

    I have verified the original file in production: ALL GOOD:
    14/03/2008 10:51:29 AM Not encrypted
    14/03/2008 10:51:29 AM Multiple device file: 2
    14/03/2008 10:51:29 AM Compression level: 1
    14/03/2008 10:51:29 AM File size: 11,976,097,280
    .. etc
    .. etc
    14/03/2008 10:54:58 AM File read ended at position: 11,976,097,280

    I will xcopy the original file again and see how it goes.
    Point to note, doing an integrity check on other sqb files worked fine on that server. Will know more after tonights job.

    Thanks Peter!
    Adam
  • peteypetey Posts: 2,358 New member
    During the restore, I would have thought SQB would detect errors and exit 'gracefully', rather than spin its wheels?
    Agreed. Could you please provide the backup statement that was used to create the backup, so that we know the type of backup that was created?

    If you want to compare if the original and copied files are identical, you could generate and compare the MD5 checksums for both files, instead of using the diagnostic tool. You can get a small app to generate the checksums from here.
    Peter Yeoh
    SQL Backup Consultant Developer
    Associate, Yohz Software
    Beyond compression - SQL Backup goodies under the hood, updated for version 8
  • Thanks for your continued assistance Peter:

    The backup script on Production is:
    EXECUTE master..sqlbackup N'-SQL "BACKUP DATABASE [ProductionDB] TO DISK = ''D:\Backups\Download\<AUTO>.sqb'' WITH COMPRESSION = 1, COPYTO = ''\\HOTSPARE01\BACKUPS'', INIT, THREADCOUNT = 2"', @exitcode OUT, @sqlerrorcode OUT

    The restore script on Hotspare is:
    master..sqlbackup N'-SQL "RESTORE DATABASE [ProductionFailOver] FROM DISK = ''D:\Backups\FULL_(local)_Production_20080309_180001.sqb'' WITH NORECOVERY, MOVE ''CMS_Data'' TO ''F:\Databases\ProductionFailOver_Data.MDF'', MOVE ''CMS_Log'' TO ''C:\DataLogs\ProductionFailOver_Log.LDF'', MOVE ''CATSites01_1_Log'' TO ''D:\DataLogs\ProductionFailOver_Log.LDF'', REPLACE"'

    I will run the MD5 checksum after the production backup file is copied to Hotspare again tonight.
    If checksums match, I will then run the Restore again.

    Am certain that the missing endpoint in the file is the cause of the problem, and the cause of the Restore job running forever.

    If after a successful MD5 comparison and the Restore still fails, what contingency? Are there some further logging options that I can turn on to help debug SQL Backup (during the restore process)?

    Adam
  • peteypetey Posts: 2,358 New member
    >> If after a successful MD5 comparison and the Restore still fails, what contingency?

    If the failure is due to an error in the way SQL Backup decompresses the data, the diagnostic tool will pick this up.

    If the failure is due to SQL Server, the error will be recorded in the SQL Server error logs and/or logged in the Windows event log, under sqlvdi.

    If the failure is due to other things, an exception error log (SQBCoreServer_bugreport.txt) will be generated in the SQL Backup Agent service installation folder.
    Peter Yeoh
    SQL Backup Consultant Developer
    Associate, Yohz Software
    Beyond compression - SQL Backup goodies under the hood, updated for version 8
  • Hi Peter,

    Success!

    The corrupted (missing) file endpoint was the culprit.
    After copying and verifying the MD5 Checksums from source to copy, I re-ran the Restore job in EM

    It completed quickly.

    A future enhancement for SQL Backup, would be a nice to have SQL Backup fail on a Restore job after ‘n’ lines of:
    Found terminating marker for device 0, position (incrementing number count) or whatever internal code you use to id this message.

    This would alert the DBA that a corrupted file has caused the failure and maybe even suggest in the logs to run the MD5 checksum against original and copied .sqb files.

    You truly have a fantastic product.
    I’m a fan and more so because you’re serious about support and constant improvement.

    Kind regards,
    Adam :D
Sign In or Register to comment.