Bulk update operations?
Ellis
Posts: 2 New member
I'm looking for information on speeding up the SDC update operation (bulk operation options, statistics rebuilding, etc.)
We're migrating data from a system with no CDC columns, and one of our challenges is taking a snapshot of the data from October, comparing it to data against May, and identifying the CRUD operations. We have tables in the schemas that range from the thousands of records to ~5 million records.
The current process compares a nonclustered index(due to data quality issues) in both schemas. There is a trigger on update that updates a MODIFIED_DT column and a default GET_DATE() on an CREATED_DT column. I'm attempting to use SQL Data Compare to apply all Inserts (reasonably fast) and updates (our slowest operation) to a copy of the older schema. So
1. we load our May data into a MAY schema
2. then we load our October data into an OCTOBER schema
3. use the trigger and SDC to make May match October, identifying our insert, updates, and deletes through these date columns
The update step is taking an extremely long time, and I'm wondering if there are any steps I can perform to optimize this. I've tried rebuilding statistics before every load and making sure we're indexing the compare columns, but we're still looking at days worth of runtime for our updates. Any advice?
We're migrating data from a system with no CDC columns, and one of our challenges is taking a snapshot of the data from October, comparing it to data against May, and identifying the CRUD operations. We have tables in the schemas that range from the thousands of records to ~5 million records.
The current process compares a nonclustered index(due to data quality issues) in both schemas. There is a trigger on update that updates a MODIFIED_DT column and a default GET_DATE() on an CREATED_DT column. I'm attempting to use SQL Data Compare to apply all Inserts (reasonably fast) and updates (our slowest operation) to a copy of the older schema. So
1. we load our May data into a MAY schema
2. then we load our October data into an OCTOBER schema
3. use the trigger and SDC to make May match October, identifying our insert, updates, and deletes through these date columns
The update step is taking an extremely long time, and I'm wondering if there are any steps I can perform to optimize this. I've tried rebuilding statistics before every load and making sure we're indexing the compare columns, but we're still looking at days worth of runtime for our updates. Any advice?
Answers
Redgate Software
We get a set of data from a couple different source systems, ABC and 123. Neither has reliable created or modified dates in their data. We then take a production copy ( calling it the May snapshot, it has data from the May 2016 to May 2018) of those two systems, interrelate the data, and then load it into a combined system for testing.
A few months later, we get the same set of tables from ABC and 123, this version has data from October back, including the already loaded May snapshot.
We're trying to capture the slice of data between the May snapshot and the October snapshot and load it in to our combined system as quickly as possible. We need to see every column that changed between the end of the first snapshot to the end of the second, and mark it for update or insert. This is where we're trying to leverage the Sql Data Compare.
Does that clear the requirements up a bit?