Data Compare

ewancampbell69 · May 9, 2008 5:18AM

Good day to you - I hope you can help.

Using SQL Data Compare version 6.1.1.308 Pro
Command Line job to update a database on server B from the same database on server A. (identical schemas)

PROBLEM - job is taking over 2 hours to complete scheduled job.

Network speed is good.
Not a lot of new data to update each night.
Databases is large however ~ 80GB but only has 6 tables

Why is it taking so long ?

Any help appreciated.

Thanks
Ewan

Robert C · May 9, 2008 5:56AM

Hi,

At a really quick calculation, that doesn't seem to be too bad - to pull 160GB of data over the wire in two hours would average at about 200Mbps, which is about all you can hope for on Gigabit ethernet to a single machine in my experience.

It's also possible that you're limited by hard disk speed on the machine performing the comparison. In that case, if it's a fast (in terms of CPU) machine, you could try enabling compression of temporary files. To do this, use /option:CompressTemporaryFiles on the command line.

Hope that helps,
Robert

Robert C · May 9, 2008 6:05AM

Hi again,

Another thought (courtesy of RichardJM) - if you know something more about the data that's likely to be new / changed, such as a timestamp that gets updated when rows are modified, you could maybe use a WHERE clause to limit the number of rows that are compared, which would obviously speed things up.

Rob

Robert C · May 9, 2008 6:11AM

And another couple (you've got us all thinking about this now!)...

1) Try using the /ignoreidentical option. This should stop any identical rows being written to disk during the comparison, so will speed up that bit of it. Obviously won't help if the network / SQL Server is the bottleneck, though.

2) Are you using clustered indexes as the comparison keys? We retrieve the rows in the order defined by the comparison key, so if you use one that matches a clustered index, SQL Server will be able to return the rows much more rapidly.

Rob

ewancampbell69 · May 9, 2008 7:08AM

Thanks for the prompt replies guys !

Dont think the SQL server spec is an issue
Connected to SAN storage, with 8 CPUs and 32GB RAM.

Makes sense about the data transfer.
I was wondering was there a way to do the compare without transferring the whole DB across to server B first ?

Good idea - I'll experiment with the /ignoreidentical command line switch.
Timestamp idea is also good, i'm sure we have one.
The WHERE clause - wont be feasible really i dont think in production - the DB I want to use this on has over 300 tables.
Yes - we use PK clustered index columns to do the compare.

Thanks again
Ewan

Robert C · May 9, 2008 7:38AM

Hi,

Something we are considering in the future is the ability to allow only certain columns to be compared, so you could define a row as being "equal" if a subset of those columns were equal. Would this be something that would be useful in your situation?

Other than that, I don't think there's any way of reducing the data we send over the network - if you're going to check it's equal, you need to get both copies to the same place somehow.

Rob

ewancampbell69 · May 9, 2008 8:43AM

Afternoon Rob

Thanks for the reply.

Very pleased with the products - we use the toolkit a lot.
Excellent for most of our DB synch tasks here.

Just wondering how feasible it will be to use Data Compare when our DB gets to terabytes of data. ( pretty soon )
I'm trying to avoid copying huge amounts of data across the network.
Was hoping to use Data Compare to only transfer rows that have changed across from server A to server B.
So no way to compare across say linked servers ?
May be possible for us to link both servers into the same SAN - have you had any experience of this setup ?

Thanks
Ewan

Robert C · May 9, 2008 8:55AM

Hi,

Data Compare will work with TB sized databases, but I think you are going to hit the fundamental problem that reading that amount of data, in any way, is going to take a long time.

Linked Servers aren't going to help you either, I'm afraid - all they'd do is effectively redirect all traffic from from server B through server A before getting to Data Compare, so that'd make it slower if anything.

You could probably improve performance somewhat by running SDC on one of the servers directly, so then you're only pulling half the data over the network (i.e from one of the servers, rather than both).

Another possibility is that if they're both on the same SAN, and you're also doing backups to that SAN, you could run SDC against the local live server, and a backup of the "remote" server, but which is stored on a "local" (i.e. much higher bandwidth than Ethernet) SAN disk.

However, even then, I don't think it'll be a quick process on that much data!

I think your best bet is definitely to go for as restrictive a WHERE clause as you can, at which point a much smaller number of rows actually need to be compared.

Rob

ewancampbell69 · July 3, 2008 10:17AM

Afternoon !

Hope you're well.

Looking to SQL Data Compare 2 databases that are on the same server.
My question is - is the data compare job gong to generate another copy of the DB on this server ( ie. 3 DBs in total ) to do the compare, or will it just use the 2 existing DBs ?

thanks
Ewan

Robert C · July 3, 2008 12:42PM

Hi Ewan,

We won't create a third database; in fact, we don't write any data to the SQL Server during the comparison at all, only when (or if) you choose to run the synchronisation script. At that point, it's just INSERT / UPDATE / DELETE statements run against the target database, so you're still just using the original two databases.

Hope that helps,
Robert

paulh · September 14, 2008 6:23AM

Just a thought, but how about computing an MD5 hash of the rows on each server and then comparing that.

It's computationally more intensive but much less data will flow.

Basically Data Compare generates a function/sproc for each table to combine the data and generate the MD5 hash and then uses that rather than the raw table as the comparison bit - the rest remains the same.

Oh and I want a 0.0001p royalty for each row compared by this method

Robert C · September 22, 2008 6:13AM

Hi Paul,

Thanks for the suggestion - we have considered doing something along these lines, but we're obviously somewhat wary about needing to make schema changes on people's databases :-).

However, I can see something like this could be useful in the situation where the servers are linked by a (relatively) slow network. We'll certainly continue to consider it in the future.

Thanks,
Robert

Data Compare

Comments

Product Learning

Community Forums

Events & Friends

Simple Talk