Large data differences produce a long running script
joeWI
Posts: 4
I have two databases that have approximately 115,000 data differences (adds, updates and deletes) that are spread across 300 tables.
If I use SQL Data Compare, the data script contains one SQL batch. It takes 2 HOURS to execute the script in Query Analyzer.
If I use SQL Packager, the data script is broken into 2 MB XML files, each containing batches of SQL statements. If I produce a C# project, compile it (as-is), it can execute in 2 MINUTES.
Is there any way to get SQL Data Compare to produce batches of SQL statements so that the script is more efficient?
(I'm assuming the performance hit is caused by SQL Server having to parse the entire script (in this example, 40 MB), create an execution plan for the entire script and then execute it. I base this on the fact that nothing shows in the output area of Query Analyzer for the first hour.)
Thank you in advance for any assistance you can provide.
Joe
If I use SQL Data Compare, the data script contains one SQL batch. It takes 2 HOURS to execute the script in Query Analyzer.
If I use SQL Packager, the data script is broken into 2 MB XML files, each containing batches of SQL statements. If I produce a C# project, compile it (as-is), it can execute in 2 MINUTES.
Is there any way to get SQL Data Compare to produce batches of SQL statements so that the script is more efficient?
(I'm assuming the performance hit is caused by SQL Server having to parse the entire script (in this example, 40 MB), create an execution plan for the entire script and then execute it. I base this on the fact that nothing shows in the output area of Query Analyzer for the first hour.)
Thank you in advance for any assistance you can provide.
Joe
This discussion has been closed.
Comments
This is very interesting because the reason that SQL Packager breaks the script resources into 100MB assemblies is actually a workaround for a limitation in .NET that only allows you to have 100MB or resources inside of a namespace. This is a side-effect rather than by design.
The only way to get Data Compare to break up a large synchronization into smaller scripts would be to perform many comparisons with a selection of a few tables at a time. Either that, or break the script up manually.
I've heard that QA is not the most efficient tool for running large queries, so you may want to try running the synchronization from inside Data Compare, or perhaps consider writing your own ADO .NET-based program for executing SQL queries. ADO .NET technology seems to have some significant performance improvements compared to the older DMO-based programs like Query Analyzer.
You might also want to turn off transactions in the output script (you can do this from project options) as this will allow SQL Server to process your script more efficiently without having to worry about rolling back any changes.
- James
--
James Moore
Red Gate Software Ltd
Head of DBA Tools
Red Gate Software Ltd
Brian:
Per your suggestion, I ran the script inside Data Compare. It runs in a little over 1 minute. (Show off! )
We do have a database utility that backups, applies upgrades, etc. When it was running so long, I used Query Analyzer to see whether our utility was the performance bottleneck or SQL Server itself.
Our work around is to:
Read the data script line by line.
When we have accumulated 10 lines, we start looking for a logical place to break the script and submit what's been accumulated to SQL Server. Thus feeding it small pieces of the data script.
Because a SQL statement can be spread across multiple lines, we could not use a simple line count. (We have data that contains carriage returns. :roll:)
So once our threshold is met (i.e. 10 lines), as soon as we hit a line that begins with "INSERT INTO [dbo].[", "DELETE FROM [dbo].[" or "ALTER TABLE [dbo].[", we break and submit everything that has been previously accumulated to SQL Server. (We ignored update statements as those are rarer.)
So far, it seems to be working. The script running through our utility takes about a minute as well.
I can't think of a scenario where this rule would fail in a data script.
James:
Thanks for the suggestion. I had already tried that without any significant difference.
I think that possible you may be able to squeeze a bit more out of Query Analyzer if you insert more GO statements into the script. This should have the same effect as what you're doing, because it breaks the script up into more 'batches'.
Every so often, we regenerate the script (via Data Compare) and package the script with our database utility. Right now, it's 115,000 lines long and continues to grow as more data is gathered. Therefore manually editing the script each time is not practible for us. (I also worry about human error injecting bugs into the script.)
BTW, I was only using Query Analyzer to see whether there was something wrong with our database utility program, the script or SQL Server.
Unless you see a flaw in what we're doing (having our database utility automatically batch), I think that is our only viable solution at this point.
Thanks for all your assistance in this matter.
Joe