Shuffle rule - taking too long

KimHoutmeyers · June 4, 2021 11:52AM

Hi,

I just create a shuffle rule which consists of 15 columns and runs on database. The system identifies it needs to shuffle 116000 rows; but then runs for 20 minutes, after which I manually stop it.
Do I need to have more patience or is there a way to make this run more smoothly?

PlantBasedSQL · June 7, 2021 7:04AM

Hi @KimHoutmeyers
Thank you for your post - 116,000 rows across 15 columns does seem like an awful lot to shuffle, so it may be that it's just a slightly heavier rule to run.

Can I just confirm:
- Why are you using a shuffle rule for so many columns on the table all at once? Is there a reason you're not using substitution rules?
- Have you tried leaving it longer to see if it completes?
- Where are you running Data Masker, and where does it sit in relation to the DB you're trying to mask?
- Are any of the columns you're shuffling present in any non clustered indexes on the table?

Thank you very much!
Kindest

KimHoutmeyers · June 7, 2021 7:46AM

Thanks already for the response, please find my answers below:

- these columns logically belong together. As there are no masking sets for IBAN numbers/bank account numbers. I was trying to dissassociate them from the original records so that these are still valid bank numbers. The remaining 14 columns is associated data (bank address, ...) so that the data logically stays together.
- I haven't tried to have it run longer than than...I'll try it.
- It sits on a different server, other rules did run fine.
- I haven't verified the clustering, but that might be the case

PlantBasedSQL · June 7, 2021 7:54AM

Thanks @KimHoutmeyers

Would you be able to provide a non-sensitive example of this just so that I'm understanding the requirement properly? That way I might be able to help better - feel free if you'd prefer to email the example to chris.unwin@red-gate.com

On the other notes, the only 3 things that stands out for me are - 1) I might expect a rule like this to take 30-40 minutes based on the numbers involved, however longer than this would be too slow (obviously), 2) Non-clustered indexes _could_ be an issue here if they're trying to wrestle with the rule and all update at the same time and 3) there _might_ be a workaround with creating a new correlated dataset to use for replacement instead, but again I would need an example to best advise on this!

Thank you very much,
Kindest

KimHoutmeyers · June 7, 2021 8:07AM

Thank you, i'll already try the 30-40 minute wait first and if that resolves it, I will close the item out. I indeed also thought about creating our own data sets...but since I'm entirely new to the product I wasn't sure how to execute that.

PlantBasedSQL · June 7, 2021 8:08AM

Thanks @KimHoutmeyers
Video on creating DataSets is here: https://www.red-gate.com/hub/university/courses/data-masker/getting-started-with-data-masker-for-sql-server/masking-rules-and-concepts/datasets-can-i-make

I will wait to hear from you on how the run performs!
Kindest

Shuffle rule - taking too long

Answers

Product Learning

Community Forums

Events & Friends

Simple Talk