Backups failing, network or resource error
kernelpanic
Posts: 5
Hello, first I'll specify the hardware of our Redgate server:
Model: HP Proliant DL380 G5
RAM: 32GB
CPU: 2 x 4-core Intel Xeon E5440 2.83GHz
OS: Windows 2008 R2 Standard 64 bit
We currently have 79 SQL servers, with various numbers of databases of various sizes, backing up to this one server. The backup scheme is differential during the week and full backups at the weekend. My problem is that some, though not all backup jobs, are failing with the following error
Also in the Redgate server's System event log I am seeing the following SRV, event 2012 error:
What I would like to know is there any recommended guidelines for the number of SQL clients backing up to one Redgate server? I ask because I get the feeling the server can't handle the amount of data being thrown at it. Should we be using more than one Redgate server for my environment?
If the number of clients/databases is not the problem does anybody have an idea as to what might be causing this? Although it obviously points to network problems I just wanted to be certain we are following guidelines on number of clients and resource usage before I ask our network team to look into this.
Thanks for any replies.
Model: HP Proliant DL380 G5
RAM: 32GB
CPU: 2 x 4-core Intel Xeon E5440 2.83GHz
OS: Windows 2008 R2 Standard 64 bit
We currently have 79 SQL servers, with various numbers of databases of various sizes, backing up to this one server. The backup scheme is differential during the week and full backups at the weekend. My problem is that some, though not all backup jobs, are failing with the following error
8/09/2012 02:04:55: Warning 210: Thread 0 warning: WriteFile failed for file: \\REDGATE-SERVER\SQLBackupE\Data\SQL-CLIENT\LiveDatabase\FULL_(local)_LiveDatabase_20120907_232529.sqb at position: 1698694144 08/09/2012 01:31:59: WriteFile failed for file: \\REDGATE-SERVER\SQLBackupE\Data\NM-HADES\LiveDatabase\FULL_(local)_LiveDatabase_20120907_232529.sqb (121: The semaphore timeout period has expired.) 08/09/2012 01:31:59: CloseTargetFile.FlushFileBuffers error: The specified network name is no longer available.
Also in the Redgate server's System event log I am seeing the following SRV, event 2012 error:
While transmitting or receiving data, the server encountered a network error. Occassional errors are expected, but large amounts of these indicate a possible error in your network configuration. The error status code is contained within the returned data (formatted as Words) and may point you towards the problem.
What I would like to know is there any recommended guidelines for the number of SQL clients backing up to one Redgate server? I ask because I get the feeling the server can't handle the amount of data being thrown at it. Should we be using more than one Redgate server for my environment?
If the number of clients/databases is not the problem does anybody have an idea as to what might be causing this? Although it obviously points to network problems I just wanted to be certain we are following guidelines on number of clients and resource usage before I ask our network team to look into this.
Thanks for any replies.
Comments
The best thing to confirm would be if the network connection is not a bottleneck. If possible also confirm if the destination location is not a disk that needs to be defragged?
The other possibility to consider is the long filenames at the destination. This might cause the issue as well.
One more thing that I can suggest is that you try the same command but point the destination as a UNC location that is on the same server. For eg put the location as \\servername\diskdrivelocation. This way the network redirector will be used but will point to the local destination. This will confirm that the network port is the bottleneck.
Thanks so much for your patience and feedback in this matter.
1.)Disk defrag - the destination disks are actually SAN volumes and I don't think my boss is too keen to run a defrag on them at the moment
2.) Destination file names - are these not automatically generated? How can I change them so they are shorter?
3.) Network redirector - unfortunately many of our SQL servers don't have the disk space to locally create/store backups of the huge databases they host, this is the reason we can't follow the recommended practice of creating the backup locally and then copying the backup to the Redgate server.
It should be said the backups are mostly failing at the weekend when the full backups are run and there is obviously more data being thrown at the server. This was why I was asking about Redgate recommendations on the number of clients backing up to a server etc.
We are going to adjust some NIC options on the Redgate server and will keep you updated.