Bulk update operations?

Ellis · November 20, 2018 4:06PM

I'm looking for information on speeding up the SDC update operation (bulk operation options, statistics rebuilding, etc.)

We're migrating data from a system with no CDC columns, and one of our challenges is taking a snapshot of the data from October, comparing it to data against May, and identifying the CRUD operations. We have tables in the schemas that range from the thousands of records to ~5 million records.

The current process compares a nonclustered index(due to data quality issues) in both schemas. There is a trigger on update that updates a MODIFIED_DT column and a default GET_DATE() on an CREATED_DT column. I'm attempting to use SQL Data Compare to apply all Inserts (reasonably fast) and updates (our slowest operation) to a copy of the older schema. So
1. we load our May data into a MAY schema
2. then we load our October data into an OCTOBER schema
3. use the trigger and SDC to make May match October, identifying our insert, updates, and deletes through these date columns

The update step is taking an extremely long time, and I'm wondering if there are any steps I can perform to optimize this. I've tried rebuilding statistics before every load and making sure we're indexing the compare columns, but we're still looking at days worth of runtime for our updates. Any advice?

sam.blackburn · November 20, 2018 5:12PM

This looks like a re-implementation of temporal tables, a SQL Server 2014 feature, although I'm having trouble understanding some of your requirements. Would you be able to use this feature and let SQL Server do the heavy lifting for you?

Ellis · November 20, 2018 6:00PM

I'll head down that route to see if it'll resolve the issue, but one of the features we're trying to leverage is the ability to compare all of our tables and columns from a May snapshot with an October snapshot without maintaining a large set of merge procs. I can try to clear up the process, please let me know if I'm not explaining well enough.

We get a set of data from a couple different source systems, ABC and 123. Neither has reliable created or modified dates in their data. We then take a production copy ( calling it the May snapshot, it has data from the May 2016 to May 2018) of those two systems, interrelate the data, and then load it into a combined system for testing.

A few months later, we get the same set of tables from ABC and 123, this version has data from October back, including the already loaded May snapshot.

We're trying to capture the slice of data between the May snapshot and the October snapshot and load it in to our combined system as quickly as possible. We need to see every column that changed between the end of the first snapshot to the end of the second, and mark it for update or insert. This is where we're trying to leverage the Sql Data Compare.

Does that clear the requirements up a bit?

Bulk update operations?

Answers

Product Learning

Community Forums

Events & Friends

Simple Talk