Best practice for re-loading Seed Data

catalogdb · October 23, 2019 12:29PM

Hello,

We are loading seed-data as given in documentation with a DELETE operation to avoid duplication.

Our seed-data often changes and we have to reload the data again in already deployed environments.
What should be the best practice to read-load the seed-data? In one environment we might have already loaded and in some environment it might have not loaded. Either case we just want to have the latest seed-data.

So, what is the best practice to re-load seed data?

Create a duplicate migration script that will delete and insert the data. (there is cance of loading data seed multiple times (without duplicates) unnecessarily when we just want only the latest to load)
Copy/clone migration script with the same delete and bulk insert script and delete the existing script (any risk?)
Change Migration Id (GUID) in existing migration script (effectively option 2, except any risk involved?)

Thanks

Kendra_Little · October 23, 2019 4:59PM

Hi there,

Pre- and post-deployment scripts are run on each deployment. Is it possible for you to write the seed-date-reload script in a way that it detects if you want to reload the seed data (in case there are some exceptions when you would not want to do so in a deployment), and then put that into a pre- or post- script?

Regarding the "delete" operation, I'm not sure what you mean in option one about the risk of loading seed data multiple times? Are you doing something like a truncate before you do the bulk insert?

If you are doing an actual DELETE operation, is there a reason you couldn't TRUNCATE? I ask because DELETE logs row by row, which can be slow. TRUNCATE is logged but much faster as it just unhooks the allocated data in the background. Just curious if you could optimize this in some way, no matter which way you are running it.

Kendra

catalogdb · October 23, 2019 5:52PM

Thanks for the reply Kendra. First of all we can't truncate because we load/reload only a subset based on key. So we delete data for that key and reload the data entirely.
Coming to the ability to detect changes. Yes, I did not think in that direction. We can do check the MIN and MAX value for the subset and do reload. Could you please help how to implement this condition in Post migration scripts?

Kendra_Little · October 23, 2019 5:57PM

Pre- and post- scripts are very similar to migration scripts, they are just executed each time you run them. I think this would be very similar to your current script, you would just have a "guard clause" detecting the conditions on which it would run.

You can have multiple pre- or post- scripts, each with an individual name, to help you organize and manage the code long term. Depending on the complexity of what you are doing, that might or might not be useful.

catalogdb · October 24, 2019 7:22AM

Post scripts sounds good. But it looks like its not an ideal solution in our case because we have multi-tenent db. So till we move final data to prod, we may have to reload the data multiple times in lower environments and we won't touch it once released to prod. So, after releasing to prod, it's unnecessary to check if it's loaded or not (from post deployment script)

So the need is to simply run the script only when data changes. Like I mentioned in my initial post, are there any risks of manually changing the migration id or delete old script and create a new one?

Only issue I see here is that the migration logs will have migration ids which won't be found in the SCA solution.

Best practice for re-loading Seed Data

Answers

Product Learning

Community Forums

Events & Friends

Simple Talk