Timeout expired

davidspencer · January 26, 2007 7:50AM

Seems not an unfamilar story. Having installed (apparently successfully) I attached to our SQL Server (2000) and selected a database. I was presented with a couple of MS Backups to include following which I sat patiently with the "Analyse..." dialogue for several minutes until it timed out (I did this many, many times). I have trawled through all the FAQ and forum. I have checked that I have correct permissions, made sure the extended procedure is safely installed in the 'master' database, all to no avail. What is worrying is that 'sqlserv.exe' starts gobbling up shed loads of CPU time which persists even after the client progam is exited. This only resolves by stopping and restarting the SQL Server. The feature set in this software is extremely enticing and I can't wait to get my hands on it. Can any of you gurus throw some light on this, please?

Brian Donahue · January 27, 2007 7:00AM

Hi David,

I'm sorry you have run into this. It's probably the number one issue with this program and we haven't got a fix for it. The work of reading the transaction log file is done using an extended stored procedure on the SQL Server end. Extended Stored Procedures run in the SQL Server process space, and this would account for the high processor usage.

Reading the log can be a lengthy process, but the stored procedure should not time out. My guess is that there is a communication process between Log Rescue and the XP to keep the connection alive and this is what is failing.

I'll chase this up with the developer again and make sure we're working on it (and that I've got my facts straight!).

Brian Donahue · February 3, 2007 5:45AM

I put TracePlus/Winsock on Log Rescue. It looks like the socket has a timeout (SO_RCVTIMEO) of around 180 minutes. All of the data is the result of one request to run the xp_logrescue extended stored procedure, this streams all of the information it gets through the SQL connection without doing anything that I'd noticed to keep the connection alive. Note that the largest log I had on-hand takes ten minutes to process.

My assumption is that Log Rescue will throw this error, no matter what, if the log processing takes more than 179 minutes, because at least at the lowest level, the socket itself will close after this time period has elapsed.

I'd assume that, even if this happens, the program could be caching information out of the log even after the connection terminates, and this is why the CPU continues to spin even after the timeout. That would be the next thing for me to look into.

Brian Donahue · February 3, 2007 6:48AM

Just to clear up, that last bit of info applies only to the stage of the analysis where data is being extracted from the live log. When retrieving backups, the timeout value is much shorter than 180 minutes.

rick.sheeley · April 10, 2007 6:38PM

We are getting the timeouts as well.

This basically makes the product unusable Brain.

I see these comments were posted in February. Surely you have some kind of fix or workaround by now for this?

Robert C · April 11, 2007 3:57AM

Hi,

One thing to check - and this is a bit of a long shot - is that you don't have any transactions running that have acquired schema modification locks. These prevent SQL Log Rescue from being able to read the live log, and will cause it to time out.

Unfortunately our ability to track this problem down is greatly hampered by not being able to reproduce it - the only way that (I think) any of us have managed to recreate it internally has been to have one of the locks mentioned above open (so something like "BEGIN TRAN; CREATE TABLE foo;", then run SLR without committing or aborting the transaction).

rick.sheeley · April 12, 2007 1:33PM

Robert,

Thanks for responding back.

Unfortunately, the Log reader timeout problem has progressed far beyond the timeout issue. We have documented that the Timeout failure is actually leaving SQL threads running, and this has led to one of our production servers locking up twice in the past two days.

Here's the entire scenario:

It appears that when the Log Recovery process runs, it times out, but then it leaves one or two â€œorphanedâ€ SPIDâ€™s that cannot be killed.

So what we think happened is this:

We responded to Help Desk ticket #62475 about AT&T Contract Renewal not working
After running several traces, we found out that the SPâ€™s were failing because someone in Philippines had deleted codes from a table.
We tried to use the Log recovery tool to get these back, but the tools kept failing (which undoubtedly left the SPIDS running and chewing up CPU and resources)
Just as today, we started getting SQL errors in the Application log. At that time, we didnâ€™t know why this was occurring. We had a false message from a controller, but that now looks to be unimportant.
Don tried to just restart SQL (didnâ€™t work), so server was re-booted.
Everything came back up, so it seemed as if it was related to the ongoing firmware issues on the HP servers
I then went out and attempted to use the Redgate utility locally to see if we could find out who removed the records on ATT_CR.
Exact same thing happened there as when Sandy and I tried it from DB10-000. However, I had discovered the renegade threads this time, and tried to kill/rollback the SPIDâ€™s. This seemed to be making progress when I left last night. I check it later, and it showed 100% rollback; however, the SPIDâ€™s never did die. Found them in the same state this am.
This am (ticket # 62493):
Found that the renegade SPIDâ€™s were still active.
Monitored them, but they didnâ€™t seem to be growing much, and systems were ok
Planned with Don to re-boot server Wed. night, and to add the â€“g switch to help with memory leak
Lockup occurred at around 11:30am
We re-booted the server and added the â€“g parameter to help stabilize the memory leak.
We also removed .Net 1.1/2.0, and Log Rescue.
Restarted server and server seemed ok at that point.

So, the LogRescue failure is being enhanced/exploiting the SQL Server memory leaks.

We can no longer trust the Log Rescue product at this point, and have removed it from our systems. I have contacted our executive staff and let them know that the product is unacceptable at this point, and we will be looking to other vendors, unless Redgate fixes this issue this month.

Robert, I do not take this action lightly. I have seen with my own eyes, the Redgate product running exactly as I described. We put alot of faith in your producxts when we first started looking at Redgate, but this product is very, very disappointing!!

This posting will be forwarded to my bosses, and to our Redgate representative, and to the Redgate executive staff today.

Nick Warren · April 16, 2007 8:31AM

Rick - I've checked through our helpdesk system, but cannot find a support call relating to this issue for you or your company.

We are keen to help resolve this problem for you and others, but as I'm sure you understand, unless we can re-produce the problem, it's almost impossible for us to debug it and fix it. To date, despite substantial effort, we've not been able to reproduce the problem in-house.

If you would be prepared to help us to help you, then can I ask you to raise a formal support call by sending an email to support@red-gate.com with as much detail as you can possibly muster. We can then follow this up formally with some ideas about how best to debug the problems.

Thanks, Nick
____________
Nick Warren
Head of Customer Support
Red Gate Software

Timeout expired

Comments

Product Learning

Community Forums

Events & Friends

Simple Talk