Speed up rules?

SSimon · May 11, 2020 6:43AM

Hi, newbie here. Running with just one substitution rule (street address) on one big table took about three times longer than a SQL UPDATE statement on the same column with an SET that replaced existing vales keeping the first letter and adding a hard coded string.
Is this as expected or can I speed things up?

PlantBasedSQL · May 11, 2020 8:07AM

Thanks @SSimon, absolutely not a problem! We're here and happy to help make sure you get up and running properly! You actually have 3 options here

1) The solution from my first reply above (plus increasing the number of Workers at the top of Data Masker) should help speed it up and give you nice realistic values.

2) You can implement your UPDATE logic via a Row-Internal Sync rule (Video here: https://www.red-gate.com/hub/university/courses/sql-provision/sql-provision/synchronising-masked-information/row-internal-synchronisation-rules) will allow you to implement your logic, but still keep the NULL/Empty cell checks, so you still end up with a realistic distribution

3) If you would actually just like to implement a single UPDATE statement but have it run as part of your Data Masker rules, then you can do so using a command rule (which you'll find under Advanced Rules) and is just t-sql

My preference is always for 1 above, because of the split ranges and realistic values, but of course you have the complete freedom to choose what works best for you!

Let me know if t here's anything else I can do to help!
Thank you very much!

PlantBasedSQL · May 11, 2020 7:36AM

Hi @SSimon Thanks for your question!

First thing - may I ask which version of Data Masker you're running, how large the table is that you're running this rule over (row count) and how long the masking is taking please? If you're able to provide the rows/second figure as well that would be helpful!

I would always expect a masking rule to take longer than an UPDATE statement in isolation, this is due to the blank and null checks and also the selecting of the realistic values from within the data sets. I would say that the masking could be sped up majorly 1 or both of the following ways:

1) (My preference) If you go into your Substitution rule and go to "Change Managers" you should see the option to convert your rule to a split range manager rule. This allows you to run the same rule over the table but split the rules dynamically over an INT field to take advantage of parallel running. Documentation on this can be found here: https://documentation.red-gate.com/dms6/data-masker-help/masking-rules/about-range-splitter-rules

2) (One to check) If you go to the tables tab you should be able to right click and view all indexes on the table in question - an index could be interfering with the updating of the values if the field is large and present in one or more non-clustered indices, if so then you can use the Index Management rules in Data Masker to temporarily disable and re-enable those indexes at the beginning and end of the masking run.

I hope this helps, let me know if there's anything I can do to help further!
Thanks

SSimon · May 11, 2020 8:01AM

Thank you for the answer! This is just my first attempt, so I tried to keep the question generic. Is there a way to make a rule mimic a tsql update
" UPDATE tbl SET col = LEFT(col,1) + 'FixedStr' "
and would that perform more like running the UPDATE statement from outside Data Masker, or would there still be some overhead?

SSimon · May 11, 2020 9:05AM

Thanks again, I tried with a command rule and that gave similar times as running the tsql from ssms. So, I can choose different methods depending on the table and data structure.

Speed up rules?

Best Answer

Answers

Product Learning

Community Forums

Events & Friends

Simple Talk