Optimizing Syncronization Performance

I have a table that is now 800,000+ rows in size and is getting slow to compare. I have been thinking about possible way of making this faster and would like to know if the following idea would work.

Would it be possible to use either an datetime or timestamp column as a base comparison column. Then the single local and remote column could be compared and if that one column value has changed then the rest of the columns in the table can be compared. This way if the local and remote column values are the same the row comparison can be skipped.

What do you think?

I also wanted to ask if I can use the where clause filter to do something like this anyway. Can the where clause filter be used to say WHERE Local.ColumnName <> Remote.ColumnName? This would also help as the rows where the datetime value was the same could be ignored.

Comments

  • Brian DonahueBrian Donahue Posts: 6,590 Bronze 1
    Hello,

    The WHERE clauses in Data Compare were only ever meant to apply to one table -- I don't think it's possible to use them to join in the table in the second database.

    If I understand you correctly, I think Data Compare is using the method that you describe already. The comparison key column is compared in both tables before the rest of the columns, which are only analyzed if the comparison key column values match. All rows of data in which the comparison key column is outside the set of matching values are considered 'additional' or 'missing' records and in that case the rest of the columns would not be compared because it's unnecessary.
  • I'm actually talking about having an additional custom comparison key.

    So when the comparison key does match then the custom comparison key is used. This way even when the comparison key does match only the matches with a differing custom key will be checked.

    The comparison key you are talking about is only used for find matches, or what will become update statements, and not to find changed rows.
  • Hi,

    this won't help the discussion directly, but i just wanted to add that i have created a where clause that references another table, and it seems to work fine, to limt rows considered the way we want. here is my where clause (contactsinfo is the table in focus, and rep table is a separate, related table):

    contactpk in (select ContactInfoFK from rep r where r.contactinfofk=contactinfo.ContactPK)

    Fred
    mohaaron wrote:
    I'm actually talking about having an additional custom comparison key.

    So when the comparison key does match then the custom comparison key is used. This way even when the comparison key does match only the matches with a differing custom key will be checked.

    The comparison key you are talking about is only used for find matches, or what will become update statements, and not to find changed rows.
  • Thanks for the additional information. This is good to know, although what I would like to be able to do is compare the local table.column to the remote table.coloumn.

    Your discovery would be helpful in other situations though.

    Regards,

    Aaron
  • OK, I'm back again and wondering if anyone from red gate has my post here about having an additional comparison key to further limit the rows that are checked for changes. I think a timestamp column would be perfect for this, a datetime column could also be used though. I would say that if there is a timestamp column on the table then it should be used to perform the additional comparison.

    So when the additional comparison key is found it's tested for a difference and only then does the full row get compared to find out what changed. This must be faster then how it's done now which I suspect is to simply check all column/values in the table for all comparison keys even when there is no difference.
  • I'm still very curious as to why someone has not explained to my why my idea for speeding up the comparison process won't work. My original question continues to be ignored. Anyone? I would really like to know why this won't work.
  • Hi there,

    Thanks for the suggestion, but I think - and I admit I say this without trying it under a profiler - that it wouildn't speed things up in most cases. There's a couple of reasons for this:

    Firstly, Data Compare is often "disk bound", that is, the speed at which it can write to disk is often the bottleneck. Speeding up the comparison operations themselves probably wouldn't help significantly where this applies.

    Secondly, there's a question of latency: if I first
    SELECT onecolumn FROM table1
    
    retrieve the results, then individually send
    SELECT * FROM table1 WHERE mycomparisonkey=something
    
    for every row I then need to check individually, that involves a LOT more round trips to the server and back. It also means that the server has to evaluate the WHERE clause in the query every time, which is a lot more painful than just retrieving all the rows without a WHERE clause.

    I think there are some situations where your method would be faster - an obvious example that springs to mind would be a table with a large BLOB column on it, where most of the rows haven't changed according to the custom comparison key, so removing the need to bring back the BLOB for each row - but I think overall, it probably wouldn't.

    I hope that helps,
    Robert
    Robert Chipperfield
    Red Gate
Sign In or Register to comment.