Optimizing Syncronization Performance
mohaaron
Posts: 46
I have a table that is now 800,000+ rows in size and is getting slow to compare. I have been thinking about possible way of making this faster and would like to know if the following idea would work.
Would it be possible to use either an datetime or timestamp column as a base comparison column. Then the single local and remote column could be compared and if that one column value has changed then the rest of the columns in the table can be compared. This way if the local and remote column values are the same the row comparison can be skipped.
What do you think?
I also wanted to ask if I can use the where clause filter to do something like this anyway. Can the where clause filter be used to say WHERE Local.ColumnName <> Remote.ColumnName? This would also help as the rows where the datetime value was the same could be ignored.
Would it be possible to use either an datetime or timestamp column as a base comparison column. Then the single local and remote column could be compared and if that one column value has changed then the rest of the columns in the table can be compared. This way if the local and remote column values are the same the row comparison can be skipped.
What do you think?
I also wanted to ask if I can use the where clause filter to do something like this anyway. Can the where clause filter be used to say WHERE Local.ColumnName <> Remote.ColumnName? This would also help as the rows where the datetime value was the same could be ignored.
Comments
The WHERE clauses in Data Compare were only ever meant to apply to one table -- I don't think it's possible to use them to join in the table in the second database.
If I understand you correctly, I think Data Compare is using the method that you describe already. The comparison key column is compared in both tables before the rest of the columns, which are only analyzed if the comparison key column values match. All rows of data in which the comparison key column is outside the set of matching values are considered 'additional' or 'missing' records and in that case the rest of the columns would not be compared because it's unnecessary.
So when the comparison key does match then the custom comparison key is used. This way even when the comparison key does match only the matches with a differing custom key will be checked.
The comparison key you are talking about is only used for find matches, or what will become update statements, and not to find changed rows.
this won't help the discussion directly, but i just wanted to add that i have created a where clause that references another table, and it seems to work fine, to limt rows considered the way we want. here is my where clause (contactsinfo is the table in focus, and rep table is a separate, related table):
contactpk in (select ContactInfoFK from rep r where r.contactinfofk=contactinfo.ContactPK)
Fred
Your discovery would be helpful in other situations though.
Regards,
Aaron
So when the additional comparison key is found it's tested for a difference and only then does the full row get compared to find out what changed. This must be faster then how it's done now which I suspect is to simply check all column/values in the table for all comparison keys even when there is no difference.
Thanks for the suggestion, but I think - and I admit I say this without trying it under a profiler - that it wouildn't speed things up in most cases. There's a couple of reasons for this:
Firstly, Data Compare is often "disk bound", that is, the speed at which it can write to disk is often the bottleneck. Speeding up the comparison operations themselves probably wouldn't help significantly where this applies.
Secondly, there's a question of latency: if I first retrieve the results, then individually send for every row I then need to check individually, that involves a LOT more round trips to the server and back. It also means that the server has to evaluate the WHERE clause in the query every time, which is a lot more painful than just retrieving all the rows without a WHERE clause.
I think there are some situations where your method would be faster - an obvious example that springs to mind would be a table with a large BLOB column on it, where most of the rows haven't changed according to the custom comparison key, so removing the need to bring back the BLOB for each row - but I think overall, it probably wouldn't.
I hope that helps,
Robert
Red Gate