Options

Generating valid data, beyond simple constraints

bgrittonbgritton Posts: 2 New member
edited December 22, 2016 10:22AM in SQL Data Generator
Hello.

I am struggling to get SQL Data Generator to obey data constraints that are enforced programmatically. Data Generator works fine generating records in the initial table. But when I get down further into the data model, it generates records which would break our data model.

Here is an example of what the generated data looks like.

Client Table:
ClientID_ _ClientName
1 _ _ _ _ _ Joe
2 _ _ _ _ _ John
3 _ _ _ _ _ Jane

Enrollment Table:
EnrollmentID_ _ ClientID
4_ _ _ _ _ _ _ _ 1
5_ _ _ _ _ _ _ _ 2
6_ _ _ _ _ _ _ _ 3

Services Table:
ServiceID_ EnrollmentID_ _ ClientID
7_ _ _ _ _ _4_ _ _ _ _ _ _ _ 3
8_ _ _ _ _ _5_ _ _ _ _ _ _ _ 1
9_ _ _ _ _ _6_ _ _ _ _ _ _ _ 2

The initial records in the Client and Enrollment tables are fine. The problem comes when populating the Services table. A Services record with an EnrollmentID of 4 and a ClientID of 3 is invalid, because EnrollmentID 4 is owned by ClientID 1. The Services record for EnrollmentID 6 is also bad, because ClientID 2 is not tied to EnrollmentID 6 in the Enrollment table.

I need to make sure when data is populated at lower levels of the data model, it obeys more than the strict foreign key constraints, which simply validate that the foreign record exists. How can I give Data Generator constraints like this to make sure it generates data that follows our data model?

Comments

  • Options
    Hi,

    This is a difficult scenario. The way I've done this is to use a Python script on the ClientID column for the services table to select the value straight from the Enrollment table. This way the relationship between the ClientID and EnrollmentID values can be preserved.
    def main(config):
        import System.Data.SqlClient
        connection = System.Data.SqlClient.SqlConnection(config["connection_string"])
        command = System.Data.SqlClient.SqlCommand("SELECT ClientID FROM Enrollment WHERE EnrollmentID = " + EnrollmentID, connection)
        connection.Open()
        value = command.ExecuteScalar()
        connection.Close()
        return value
    

    The way that your tables are set up seems to imply that there is some duplication by having the same values for EnrollmentID and ClientID in both the Enrollment and Services tables. Have you considered removing the ClientID column from the Services table and joining the Enrollment table on the EnrollmentID column whenever you query the Services table? Or alternatively, creating a view?
    Software Engineer
    Redgate Software
Sign In or Register to comment.