Data Compare
ewancampbell69
Posts: 5
Good day to you - I hope you can help.
Using SQL Data Compare version 6.1.1.308 Pro
Command Line job to update a database on server B from the same database on server A. (identical schemas)
PROBLEM - job is taking over 2 hours to complete scheduled job.
Network speed is good.
Not a lot of new data to update each night.
Databases is large however ~ 80GB but only has 6 tables
Why is it taking so long ?
Any help appreciated.
Thanks
Ewan
Using SQL Data Compare version 6.1.1.308 Pro
Command Line job to update a database on server B from the same database on server A. (identical schemas)
PROBLEM - job is taking over 2 hours to complete scheduled job.
Network speed is good.
Not a lot of new data to update each night.
Databases is large however ~ 80GB but only has 6 tables
Why is it taking so long ?
Any help appreciated.
Thanks
Ewan
Comments
At a really quick calculation, that doesn't seem to be too bad - to pull 160GB of data over the wire in two hours would average at about 200Mbps, which is about all you can hope for on Gigabit ethernet to a single machine in my experience.
It's also possible that you're limited by hard disk speed on the machine performing the comparison. In that case, if it's a fast (in terms of CPU) machine, you could try enabling compression of temporary files. To do this, use /option:CompressTemporaryFiles on the command line.
Hope that helps,
Robert
Red Gate
Another thought (courtesy of RichardJM) - if you know something more about the data that's likely to be new / changed, such as a timestamp that gets updated when rows are modified, you could maybe use a WHERE clause to limit the number of rows that are compared, which would obviously speed things up.
Rob
Red Gate
1) Try using the /ignoreidentical option. This should stop any identical rows being written to disk during the comparison, so will speed up that bit of it. Obviously won't help if the network / SQL Server is the bottleneck, though.
2) Are you using clustered indexes as the comparison keys? We retrieve the rows in the order defined by the comparison key, so if you use one that matches a clustered index, SQL Server will be able to return the rows much more rapidly.
Rob
Red Gate
Dont think the SQL server spec is an issue
Connected to SAN storage, with 8 CPUs and 32GB RAM.
Makes sense about the data transfer.
I was wondering was there a way to do the compare without transferring the whole DB across to server B first ?
Good idea - I'll experiment with the /ignoreidentical command line switch.
Timestamp idea is also good, i'm sure we have one.
The WHERE clause - wont be feasible really i dont think in production - the DB I want to use this on has over 300 tables.
Yes - we use PK clustered index columns to do the compare.
Thanks again
Ewan
Something we are considering in the future is the ability to allow only certain columns to be compared, so you could define a row as being "equal" if a subset of those columns were equal. Would this be something that would be useful in your situation?
Other than that, I don't think there's any way of reducing the data we send over the network - if you're going to check it's equal, you need to get both copies to the same place somehow.
Rob
Red Gate
Thanks for the reply.
Very pleased with the products - we use the toolkit a lot.
Excellent for most of our DB synch tasks here.
Just wondering how feasible it will be to use Data Compare when our DB gets to terabytes of data. ( pretty soon )
I'm trying to avoid copying huge amounts of data across the network.
Was hoping to use Data Compare to only transfer rows that have changed across from server A to server B.
So no way to compare across say linked servers ?
May be possible for us to link both servers into the same SAN - have you had any experience of this setup ?
Thanks
Ewan
Data Compare will work with TB sized databases, but I think you are going to hit the fundamental problem that reading that amount of data, in any way, is going to take a long time.
Linked Servers aren't going to help you either, I'm afraid - all they'd do is effectively redirect all traffic from from server B through server A before getting to Data Compare, so that'd make it slower if anything.
You could probably improve performance somewhat by running SDC on one of the servers directly, so then you're only pulling half the data over the network (i.e from one of the servers, rather than both).
Another possibility is that if they're both on the same SAN, and you're also doing backups to that SAN, you could run SDC against the local live server, and a backup of the "remote" server, but which is stored on a "local" (i.e. much higher bandwidth than Ethernet) SAN disk.
However, even then, I don't think it'll be a quick process on that much data!
I think your best bet is definitely to go for as restrictive a WHERE clause as you can, at which point a much smaller number of rows actually need to be compared.
Rob
Red Gate
Hope you're well.
Looking to SQL Data Compare 2 databases that are on the same server.
My question is - is the data compare job gong to generate another copy of the DB on this server ( ie. 3 DBs in total ) to do the compare, or will it just use the 2 existing DBs ?
thanks
Ewan
We won't create a third database; in fact, we don't write any data to the SQL Server during the comparison at all, only when (or if) you choose to run the synchronisation script. At that point, it's just INSERT / UPDATE / DELETE statements run against the target database, so you're still just using the original two databases.
Hope that helps,
Robert
Red Gate
It's computationally more intensive but much less data will flow.
Basically Data Compare generates a function/sproc for each table to combine the data and generate the MD5 hash and then uses that rather than the raw table as the comparison bit - the rest remains the same.
Oh and I want a 0.0001p royalty for each row compared by this method
Thanks for the suggestion - we have considered doing something along these lines, but we're obviously somewhat wary about needing to make schema changes on people's databases :-).
However, I can see something like this could be useful in the situation where the servers are linked by a (relatively) slow network. We'll certainly continue to consider it in the future.
Thanks,
Robert
Red Gate