Backups failing, network or resource error

kernelpanickernelpanic Posts: 5
edited September 24, 2012 9:05AM in SQL Backup Previous Versions
Hello, first I'll specify the hardware of our Redgate server:

Model: HP Proliant DL380 G5
RAM: 32GB
CPU: 2 x 4-core Intel Xeon E5440 2.83GHz
OS: Windows 2008 R2 Standard 64 bit

We currently have 79 SQL servers, with various numbers of databases of various sizes, backing up to this one server. The backup scheme is differential during the week and full backups at the weekend. My problem is that some, though not all backup jobs, are failing with the following error
8/09/2012 02:04:55: Warning 210: Thread 0 warning: 
WriteFile failed for file: \\REDGATE-SERVER\SQLBackupE\Data\SQL-CLIENT\LiveDatabase\FULL_(local)_LiveDatabase_20120907_232529.sqb at position: 1698694144
08/09/2012 01:31:59: WriteFile failed for file: \\REDGATE-SERVER\SQLBackupE\Data\NM-HADES\LiveDatabase\FULL_(local)_LiveDatabase_20120907_232529.sqb (121: The semaphore timeout period has expired.)
08/09/2012 01:31:59: CloseTargetFile.FlushFileBuffers error: The specified network name is no longer available.

Also in the Redgate server's System event log I am seeing the following SRV, event 2012 error:

While transmitting or receiving data, the server encountered a network error. Occassional errors are expected, but large amounts of these indicate a possible error in your network configuration.  The error status code is contained within the returned data (formatted as Words) and may point you towards the problem.

What I would like to know is there any recommended guidelines for the number of SQL clients backing up to one Redgate server? I ask because I get the feeling the server can't handle the amount of data being thrown at it. Should we be using more than one Redgate server for my environment?

If the number of clients/databases is not the problem does anybody have an idea as to what might be causing this? Although it obviously points to network problems I just wanted to be certain we are following guidelines on number of clients and resource usage before I ask our network team to look into this.

Thanks for any replies.

Comments

  • I am the support engineer that is investigating this issue you have posted. I am responding to this request via the support case that this forum thread has created. From initial investigation this might be related to hardware resourcing.

    The best thing to confirm would be if the network connection is not a bottleneck. If possible also confirm if the destination location is not a disk that needs to be defragged?

    The other possibility to consider is the long filenames at the destination. This might cause the issue as well.

    One more thing that I can suggest is that you try the same command but point the destination as a UNC location that is on the same server. For eg put the location as \\servername\diskdrivelocation. This way the network redirector will be used but will point to the local destination. This will confirm that the network port is the bottleneck.

    Thanks so much for your patience and feedback in this matter.
  • Thanks for the reply,

    1.)Disk defrag - the destination disks are actually SAN volumes and I don't think my boss is too keen to run a defrag on them at the moment :)

    2.) Destination file names - are these not automatically generated? How can I change them so they are shorter?

    3.) Network redirector - unfortunately many of our SQL servers don't have the disk space to locally create/store backups of the huge databases they host, this is the reason we can't follow the recommended practice of creating the backup locally and then copying the backup to the Redgate server.

    It should be said the backups are mostly failing at the weekend when the full backups are run and there is obviously more data being thrown at the server. This was why I was asking about Redgate recommendations on the number of clients backing up to a server etc.

    We are going to adjust some NIC options on the Redgate server and will keep you updated.
  • I have tried disabling the TCP/UDP off-load options for the NIC on the RedGate server (this was suggested in other forums for the chipset the NIC uses) however it has made no difference. We have now rescheduled the full backup jobs so they are more spread out over the weekend, I will update this thread next week with the results.
  • Rescheduling some of the backup jobs to be more staggered over the weekend seems to have solved the problem, it looks like Windows 2008 and/or the NIC couldn't handle the large amount of data being thrown at it.
Sign In or Register to comment.