6.2 UI Crash and other pains
PDinCA
Posts: 642 Silver 1
Looks like the UI Exception reported for 6.1 didn't make it...(http://www.red-gate.com/messageboard/viewtopic.php?t=9301) 6.2 crashes, which makes it a tool I am reluctant to go into. Right now, I can continue to use the UI despite the fact that I have an unresponded-to Exception dialog open from the latest crash.
What was I doing? Merely clicked on a server while all the server connections had yet to refresh - that's the first crash. I restarted and waited for every connection to refresh. I started the server component upgrades and I made one of my clicks while another server was refreshing - Crash! And that's the box I will not dismiss because I can't be bothered to keep restarting the UI!
I upgraded one of the servers - SS2K8, WS2K8. It took two tries. 1st time it failed at the version number check, so I hit "Retry" more out of hope than confidence, and it completed! But, all I now see is a big fat X for that box - and it has been 10 minutes or more since the upgrade and the other two virtuals that were created from the same image are all good after the upgrade. Surprise! If I right-click and poke it with a "Refresh" stick, it refreshes just fine
Tried to upgrade a cluster. No joy. Failed at step 4 - version comparison. This is a physical box running SS2000 under Windows 2003. Guess I'll have to remote in for that one... What is disturbing is that the "Run setup on server" completes successfully but the version comparison barfs - why? I asked the upgrader to handle both nodes, BTW. "Retry", (three times) did nothing this time. Active-passive configuration.
Tried to upgrade the newer cluster - foolish idea! Failed at step 4. This is a SS2008 box under Windows Server 2008. Asked for upgrade of both nodes. Active-passive configuration. Fails on Retry. More remote-in upgrading...
One of the virtuals has been red-X'd since I launched the UI. It likes being poked with a "Refresh" right-click stick :x. And that box took two cycles to upgrade successfully, like the first one. SS2K this time.
Here's the Exception info:
Having upgraded all the nodes individually, and having closed the UI, I restarted it. OOPS! The Exception dialog was still open when I closed the main UI. I was informed there was already a copy of SQLBackup running - not! Task Manager, nuke process tree...
Restarted UI. Expanded the first group, "Production", and all are good - upgrades all have the UI version. Expanded the second group, "Virtuals", IMMEDIATE CRASH :x Here's that exception:
What was I doing? Merely clicked on a server while all the server connections had yet to refresh - that's the first crash. I restarted and waited for every connection to refresh. I started the server component upgrades and I made one of my clicks while another server was refreshing - Crash! And that's the box I will not dismiss because I can't be bothered to keep restarting the UI!
I upgraded one of the servers - SS2K8, WS2K8. It took two tries. 1st time it failed at the version number check, so I hit "Retry" more out of hope than confidence, and it completed! But, all I now see is a big fat X for that box - and it has been 10 minutes or more since the upgrade and the other two virtuals that were created from the same image are all good after the upgrade. Surprise! If I right-click and poke it with a "Refresh" stick, it refreshes just fine
Tried to upgrade a cluster. No joy. Failed at step 4 - version comparison. This is a physical box running SS2000 under Windows 2003. Guess I'll have to remote in for that one... What is disturbing is that the "Run setup on server" completes successfully but the version comparison barfs - why? I asked the upgrader to handle both nodes, BTW. "Retry", (three times) did nothing this time. Active-passive configuration.
Tried to upgrade the newer cluster - foolish idea! Failed at step 4. This is a SS2008 box under Windows Server 2008. Asked for upgrade of both nodes. Active-passive configuration. Fails on Retry. More remote-in upgrading...
One of the virtuals has been red-X'd since I launched the UI. It likes being poked with a "Refresh" right-click stick :x. And that box took two cycles to upgrade successfully, like the first one. SS2K this time.
Here's the Exception info:
SQLBackup Unable to open the database file unable to open database file WorkerExecutionException at RedGate.SQLBackup.Engine.ConfigurableThreadPool.a(Exception ) at RedGate.SQLBackup.Engine.ConfigurableThreadPool.a.b() at System.Threading.ThreadHelper.ThreadStart_Context(Object state) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.ThreadHelper.ThreadStart() ---------- SQLiteException Unable to open the database file unable to open database file at System.Data.SQLite.SQLite3.Reset(SQLiteStatement stmt) at System.Data.SQLite.SQLite3.Step(SQLiteStatement stmt) at System.Data.SQLite.SQLiteCommand.ExecuteNonQuery() at g.g(Server ) at g.b(Object )
Having upgraded all the nodes individually, and having closed the UI, I restarted it. OOPS! The Exception dialog was still open when I closed the main UI. I was informed there was already a copy of SQLBackup running - not! Task Manager, nuke process tree...
Restarted UI. Expanded the first group, "Production", and all are good - upgrades all have the UI version. Expanded the second group, "Virtuals", IMMEDIATE CRASH :x Here's that exception:
SQLBackup Insertion failed because the database is full database or disk is full WorkerExecutionException at RedGate.SQLBackup.Engine.ConfigurableThreadPool.a(Exception ) at RedGate.SQLBackup.Engine.ConfigurableThreadPool.a.b() at System.Threading.ThreadHelper.ThreadStart_Context(Object state) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.ThreadHelper.ThreadStart() ---------- SQLiteException Insertion failed because the database is full database or disk is full at System.Data.SQLite.SQLite3.Reset(SQLiteStatement stmt) at System.Data.SQLite.SQLite3.Step(SQLiteStatement stmt) at System.Data.SQLite.SQLiteCommand.ExecuteNonQuery() at g.g(Server ) at g.b(Object )I can assure you that I have 79GIGS free on my c: drive.
Jesus Christ: Lunatic, liar or Lord?
Decide wisely...
Decide wisely...
Comments
The error you are receiving is usually due to the cached history for the server becoming corrupt.
Can you please try the following :
1. Close the SQL Backup GUI.
2. Locate the following folder on the machine running the GUI - %USERPROFILE%\local settings\Application Data\Red Gate\SQL Backup\Server Data
3. Delete this folder and it's contents.
4. Open the SQL Backup GUI.
Please confirm that this resolves your issue.
Product Support
Redgate Software Ltd.
E-mail: support@red-gate.com
Maybe you could give this a try and let me know how it goes.
ftp://support.red-gate.com/utilities/Re ... aStore.zip
Brian, when you say "server", I wonder which of the 8 you mean - all of them!? It doesn't seem to matter whether I expand the Production group first or the Virtuals, whichever is done second provokes the crash.
SQLBackup GUI is installed on my laptop and one other server. The GUI hasn't been used on that server in at least 3 months as it's a fallback install. Server components are on the monitored servers, including the server where the GUI is installed. Which .Net do I need - as in, is 3.5 the correct version to have to install up to? And, as we are DBA-challenged, what is the effect of the .Net install as in, will the clusters need rebooting, what other effects are there? Risk is not something I want to entertain right now.
Decide wisely...
I think that on the surface it's a client-side issue because SQLLite is the technology used to host the SQL Backup console's data cache. However, since you have deleted the data from there, I'd guess that the server-side data is corrupt. The evidence does not give us a clear conclusion, though. If you can, I think you should run the server side utility to verify all of the SQL CE databases that SQL Backup uses on the server-side. If you aren't allowed .NET on the SQL Server, then you can't do that.
This leads me to believe that the Servers are fine and that the problem lies with the UI... Laptop is Windows XP Pro SP3 with every Windows Update applied as of this morning - BEFORE the latest reboot and crash sequence.
Any log files I can send, or anything I can run for you to capture activity? - more than happy to do so...
Please don't forget the upgrade issues where it says the server upgrade is OK but the version comparison fails.
Decide wisely...
A number of your valild concerns are bring addressed. The unhandled exception you experience when expanding server groups may be related to a local cache issue that occurs when the file is locked (either by another application or perhaps concurrent activity within the GUI).
You will need the .NET 2.0 framework to run the UI. This carries no risk.
Sadly, this just means that the installer was executed and not that the executable succeeded in its task. Remote install of the service only works in a subset of security scenarios. For clusters especially, running the manual installer is recommended. The messages that the install dialogue presents have been changed for the next release to be more helpful.
For your UI issues, the exception data and the description you described should be enough for now, though if you encounter any more bugs or annoyances more information would be appreciated.
Thanks,
Development
Red-Gate Software
I open the UI, get the "Upgrade the cache..." dialog, try EITHER the "Now" or the "Later" option and OK it, then I get to see the main window but either by just waiting or trying to expand a Group the UI dies without a message or exception - POOF!
I have deleted the Server cache files under \Server Data from C:\Documents and Settings\<<user>>\Local Settings\Application Data\Red Gate\SQL Backup
In the \SQL Backup folder there is a lockfile.pid 0KB file that's locked by some process or other. It is dated 1/11/2008... Should I attempt to unlock it and whack it?
I AM DEAD IN THE WATER!
Decide wisely...
The UI dies without message or exception? That is most disturbing.
You can enable logging by placing the following registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Red Gate\SQL Backup\Client"FullLogging" set to "1" (REG_SZ)
Which will hopefully shed some light.
Development
Red-Gate Software
Location of the log file(s) you need?
Decide wisely...
You should be able to find them here:
C:\Documents and Settings\[USERNAME]\Local Settings\Application Data\Red Gate\Logs\SQL Backup
Cheers,
Development
Red-Gate Software
Decide wisely...
Apologies for the untimely reply. I have been looking into this one and can reproduce the bug if the local cache files are locked by another application. (The lockfile.pid is used to ensure only one copy of the UI is running at a time. It should only be locked when the UI is running and is probably not related to what you're experiencing).
Can you please check if any other programs have a lock on the local cache files? These are named X.dat where X is an integer, and live in Local Settings\Application Data\Red Gate\SQL Backup\Server Data.
You can use sysinternals process explorer to identify which programs have a file open (CTRL-F). So just before expanding the second server group please quickly check which application has the lock.
I might also like to have a look at your localDataCache.dat file. This contains the mappings of instance names to X.dat filenames. If that's ok, please email it to me at (robin.anderson [at] red-gate.com).
Development
Red-Gate Software
Email of local....dat incoming.
Decide wisely...
By the look of the errors you are getting the locks will only be held very temporarily. Are you running a virus scanner, does it have an on-access check? If that took an exclusive lock while it was checking the files then that could cause the errors you are seeing.
Would it be possible to exclude that directory from your virus scanner/temporarily disable it and see if you continue to see problems?
Thanks,
James
Head of DBA Tools
Red Gate Software Ltd
Perhaps a filemon/procmon trace would shed some light? Then we could see exactly what's happening when the crash occurs.
Development
Red-Gate Software
SO Close! I excluded every subfolder under Local Settings\Application Data\Red Gate\ from Symantec's Endpoint Protection.
I managed to expand the Production group and two of its servers, but when I expanded the Development Group - poof! No Exception, no message, just "gone"...
Do I need to do any excludes on the servers too?
Robin:
filemon/procmon are unfamiliar to me, so I'd need some assistance to set them up... Details?
When I first open SQL Backup, I get the "Upgrading Activity Cache" dialog every time - it never appears to complete, even having managed to keep the UI active for a few minutes. I usually choose the default "Upgrade Now" but even if I choose "Later" I get the error: "Some cache files could not be upgraded. It may initially take a few minutes to display activity history for some servers." I whacked the entire Server Data folder content and now, upon attempted expansion of the Development group:
localDataCache.dat 4.dat 4.dat-journal 6.dat 6.dat-journal 7.dat 8.dat 8.dat-journal
As the files are binary, I can't see the content of the %journal files... Do you want them?ALL:
Took the sledgehammer approach and uninstalled 6.2, rebooted, reinstalled from fresh Toolbelt download and:
1. Added Production Group and each of three servers under it. 2. Added Dev Group and the first virtual OK. 3. Tried to add a SS2000 instance and the UI crashed with no exception. 4. When SQL Backup restarted, I received the "Upgrading Cache" dialog, chose "Now" and my list of groups and servers was EMPTY. 5. Re-added the Production Group and all 3 servers. OK. 6. Closed SQL Backup. 7. Restarted SQL Backup and my Group is intact. 8. Added the DEV Group and one server. OK. 9. Exit and restart - OK. 10. Tried to add a SS2008 instance - crash! 11. Went to the SS2000 and SS2008 "failed addition" servers and ran the SQB Install on both virtuals. 12. Tried to re-add the SS2008 instance - CRASH!
Full Logging is still turned on.What can I send you that may help?
Decide wisely...
I ran the utility on the SS2008 instance under Windows Server 2008 - the one that bombed in the post above and the utility exceptioned: I uninstalled the SQB Server Components and reinstalled and still it crashes.
Ideas?
Decide wisely...
If it still crashes, then we know the problem relates to something server-side and if it doesn't then it's something with the UI on certain hardware/software.
Development
Red-Gate Software
filemon was a tool my microsft sysinternals that logs all file IO - saying which applications did what to which file during the period it was collecting data.
It was replaced by procmon, which is the same but also records registry, network and other data. When you start it up, it automatically starts capturing data.
Click the magnifying glass button (blue arrow) to stop it collecting. Then make sure only the file button (red) is selected, since we're not interested in the other info it can collect. Then clear the display (white page with eraser button) and start collection again. Then make the UI crash and stop collection. File>Save the trace in the default format. This file should hopefully tell me what was happening at the time.
n.b. the trace file can get big quickly - if it's too large to email, procmon has filtering capability (can right click a row which you assume is not relevant and remove all similar entries).
Development
Red-Gate Software
Decide wisely...
So you can get the UI to crash from multiple machines?
I have the server list from the file you supplied - but can you tell me if there is anything interesting or unusual about the instances in the list that might help me reproduce this?
Development
Red-Gate Software
The latest UI crash after the reinstall was when I tried to add mceSTGsql. I ran the server-side install on that box but it still crashes the UI. I also tried to add the mceDEVsql2K server, and after the crash also ran the server-side install. Still cannot add either server, but I have the mceDEVsql server up fine...
I just took a chance and added the mceUATsql server successfully! So I closed the UI and when I restarted it and expanded the dev group, the mceUATsql server showed the "red X" and then the UI crashed.
I just disabled Symantec's Endpoint Protection completely and Lavasoft's Ad-Aware Service is suspended.
I managed to add mceSTGsql successfully.
Adding mceDEVsql2K\ECOM crashed the UI - that's the one I ran the server-side install for yesterday.
Deleted the Dev Virtuals Group. Closed UI. Restarted. Added Dev Virtuals group. Added mceDEVsql - OK so far. Closed UI. Restarted. Tried to add the mceUATsql SS2008/WS2008 instance and crashed.
Now I've successfully added mceSTGsql again...
There really doesn't appear to be a clear pattern here... Any benefit in your remoting in?
Decide wisely...
Development
Red-Gate Software