Weighted List Generation
Jeffmn
Posts: 2
Can someone shed some light on how the weighting works in a weighted list generation?
I wrote a custom weighted list generator that imports a bunch of values from a .csv files and builds the generator xml. That .csv file as several thousand items that will ultimately be populated into a database containing many million rows.
My question is how is the weighting computed? On one hand the documentation says I need to express each item on the list as a percentage with the values totalling 100%, but I found elsewhere in the documentation those values are expressed in ratios.
I tried expressing each of the items as a percent but Data Generator sets the minimum value at 1 so that's out of the question given I've got 1000 items in my list that I want to weight.
I then changed all the values to be different larger numbers ranging from 1 up to 20,000 or so. How does this translate when my 1000 row list is used to populate a 10 million row table? I'm trying to understand what the individual weight values mean in the 10 million row databases I'm populating.
I wrote a custom weighted list generator that imports a bunch of values from a .csv files and builds the generator xml. That .csv file as several thousand items that will ultimately be populated into a database containing many million rows.
My question is how is the weighting computed? On one hand the documentation says I need to express each item on the list as a percentage with the values totalling 100%, but I found elsewhere in the documentation those values are expressed in ratios.
I tried expressing each of the items as a percent but Data Generator sets the minimum value at 1 so that's out of the question given I've got 1000 items in my list that I want to weight.
I then changed all the values to be different larger numbers ranging from 1 up to 20,000 or so. How does this translate when my 1000 row list is used to populate a 10 million row table? I'm trying to understand what the individual weight values mean in the 10 million row databases I'm populating.
Comments
A while ago we had a similar query about the weighted list generator where even at a simple level it behaved unexpectedly - for instance if you tried to generate 10 rows of values x, y and z on a 20, 20, 60 basis, you'd expect to get 2 x, 2 y, and 6 z. But it would often not produce this.
I queried it with the developers and apparently it's working as designed, in their words: "The values are generated at random using the weightings. Not generated in the weighted ratio then randomized."
As for how it works- it seems both ratios and a percentage should be feasible, as the popup help states:
The new version of Data Generator has an option to use a Python Script as a generator, and they were kind enough to produce a sample that would lead to a more predictable result, which I've pasted below. Hopefully it's of some use although I see you're actually working with a CSV file of values, so I'm not sure how easily you'll be able to convert it across.
Redgate Software