Shuffle rule - taking too long
KimHoutmeyers
Posts: 5 New member
Hi,
I just create a shuffle rule which consists of 15 columns and runs on database. The system identifies it needs to shuffle 116000 rows; but then runs for 20 minutes, after which I manually stop it.
Do I need to have more patience or is there a way to make this run more smoothly?
I just create a shuffle rule which consists of 15 columns and runs on database. The system identifies it needs to shuffle 116000 rows; but then runs for 20 minutes, after which I manually stop it.
Do I need to have more patience or is there a way to make this run more smoothly?
Tagged:
Answers
Thank you for your post - 116,000 rows across 15 columns does seem like an awful lot to shuffle, so it may be that it's just a slightly heavier rule to run.
Can I just confirm:
- Why are you using a shuffle rule for so many columns on the table all at once? Is there a reason you're not using substitution rules?
- Have you tried leaving it longer to see if it completes?
- Where are you running Data Masker, and where does it sit in relation to the DB you're trying to mask?
- Are any of the columns you're shuffling present in any non clustered indexes on the table?
Thank you very much!
Kindest
- these columns logically belong together. As there are no masking sets for IBAN numbers/bank account numbers. I was trying to dissassociate them from the original records so that these are still valid bank numbers. The remaining 14 columns is associated data (bank address, ...) so that the data logically stays together.
- I haven't tried to have it run longer than than...I'll try it.
- It sits on a different server, other rules did run fine.
- I haven't verified the clustering, but that might be the case
Would you be able to provide a non-sensitive example of this just so that I'm understanding the requirement properly? That way I might be able to help better - feel free if you'd prefer to email the example to chris.unwin@red-gate.com
On the other notes, the only 3 things that stands out for me are - 1) I might expect a rule like this to take 30-40 minutes based on the numbers involved, however longer than this would be too slow (obviously), 2) Non-clustered indexes _could_ be an issue here if they're trying to wrestle with the rule and all update at the same time and 3) there _might_ be a workaround with creating a new correlated dataset to use for replacement instead, but again I would need an example to best advise on this!
Thank you very much,
Kindest
Video on creating DataSets is here: https://www.red-gate.com/hub/university/courses/data-masker/getting-started-with-data-masker-for-sql-server/masking-rules-and-concepts/datasets-can-i-make
I will wait to hear from you on how the run performs!
Kindest