Skip to content

Data Definition Model - Core Elements

Data Definition Models are fundamental to DATAMIMIC's test data generation capabilities. This document covers the essential elements - if you're new to DATAMIMIC, start here. For advanced features, see Advanced Data Definition Elements.

Overview

Data Definition Models specify how test data should be generated, transformed, or obfuscated. The core elements allow you to:

  • Define data generation tasks
  • Specify key fields and their values
  • Create and use variables
  • Generate structured data sets

Basic Elements

The <setup> element is the root element for all data generation tasks. It contains one or more <generate> elements that define specific data generation operations. Learn more of its use in Configuration Models.

1
2
3
4
5
<setup>
    <generate name="users" count="100">
        <!-- Generation details goes here -->
    </generate>
</setup>

<generate>

The <generate> element is the core of Data Definition Models. It defines a data generation task and includes attributes like name, count, and target. This element is used to create structured data based on the specified configurations.

Attributes:

  • name: Specifies the name of the generation task.
  • count: Specifies the number of records to generate.
  • source: Specifies the source of the data (e.g., data/active.ent.csv, mongo).
  • target: Specifies the target output (e.g., CSV, sqliteDB).
  • type: Specifies the type of data to generate.
  • cyclic: Enables or disables cyclic generation. Default is False.
  • selector: Specifies a database query for the generation.
  • separator: Specifies a separator for the generated data. Default is |.
  • sourceScripted: Enables or disables scripted source evaluation in the source file (e.g., example.ent.csv, example.json). Default is False.
  • pageSize: Specifies the page size for data generation.
  • storageId: Specifies the ID of object storage, defined by the <object-storage> element.
  • sourceUri: Specifies the URI of the datasource on object storage (e.g., datasource/employees.csv).
  • exportUri: Specifies the URI for exporting generated data on object storage (e.g., export/product.csv).
  • container: Specifies the container name for Azure Blob Storage.
  • bucket: Specifies the bucket name for AWS S3.
  • multiprocessing: Enables or disables multiprocessing for data generation. Default is False.
  • distribution: Specifies the distribution of data source iteration (e.g., random, ordered). Default is random.
  • converter: Specifies a converter to transform value.
  • variablePrefix: Configurable attribute that defines the prefix for variable substitution in dynamic strings (default is __).
  • variableSuffix: Configurable attribute that defines the suffix for variable substitution in dynamic strings (default is __).

Children:

  • <key>: Specifies key fields within the data generation task.
  • <variable>: Defines variables used in data generation.
  • <reference>: Defines references to other generated data.
  • <nestedKey>: Specifies nested key fields and their generation methods.
  • <list>: Defines lists of data items.
  • <condition>: Conditional element to include data based on certain conditions.
  • <array>: Defines arrays of data items.
  • <echo>: Outputs text or variables for logging or debugging purposes.

Example 1: Using Object Storage for Data Generation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<setup>
    <!-- Define object-storage with ID referring to the environment -->
    <object-storage id="aws"/>
    <!-- Write file to the object-storage -->
    <generate name="external_write" bucket="datamimic-01" storageId="aws" exportUri="/datamimic_exporting_result/" target="JSON, CSV, TXT, XML" count="100">
        <key name="id" generator="IncrementGenerator"/>
        <key name="name" type="string"/>
    </generate>
    <!-- Read file from object-storage -->
    <generate name="external_read" bucket="datamimic-01" sourceUri="datamimic_exporting_result/external_write.json" source="aws" />
</setup>

Example 2: Using selector with a Database

1
2
3
4
<generate name="CUSTOMER" source="mongodb" selector="find: 'CUSTOMER', filter: {'age': {'$lt': 30}}" >
    <key name="id" generator="IncrementGenerator"/>
    <key name="name" type="string"/>
</generate>

In this example:

  • The selector is used to query the MongoDB database to find all customers under 30 years old.
  • The data is output to the ConsoleExporter.

Example 3: Generating Data with MongoDB and Aggregation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
<setup >
    <memstore id="mem"/>
    <mongodb id="mongodb"/>

    <!-- Clear collections before generating new data -->
    <generate name="delete_users" source="mongodb" selector="find: 'more_users', filter: {}" target="mongodb.delete"/>
    <generate name="delete_orders" source="mongodb" selector="find: 'more_orders', filter: {}" target="mongodb.delete"/>
    <generate name="delete_products" source="mongodb" selector="find: 'more_products', filter: {}" target="mongodb.delete"/>

    <!-- Generate orders, users, and products collections -->
    <generate name="more_orders" source="script/orders.json" target="mongodb"/>
    <generate name="more_users" source="script/users.json" target="mongodb"/>
    <generate name="more_products" source="script/products.json" target="mongodb"/>

    <!-- Perform an aggregation query to summarize user orders and spending -->
    <generate name="more_summary" count="20" >
        <variable name="result" source="mongodb"
                  selector='aggregate: "more_users",
                            pipeline: [
                              {
                                "$lookup": {
                                  "from": "more_orders",
                                  "localField": "user_id",
                                  "foreignField": "user_id",
                                  "as": "userOrders"
                                }
                              },
                              {
                                "$unwind": "$userOrders"
                              },
                              {
                                "$lookup": {
                                  "from": "more_products",
                                  "localField": "userOrders.order_item",
                                  "foreignField": "product_name",
                                  "as": "orderProducts"
                                }
                              },
                              {
                                "$unwind": "$orderProducts"
                              },
                              {
                                "$group": {
                                  "_id": "$user_id",
                                  "user_name": { "$first": "$user_name" },
                                  "order_items": { "$push": "$userOrders.order_item" },
                                  "quantities": { "$first": "$userOrders.quantity" },
                                  "total_spending": {
                                    "$sum": {
                                      "$multiply": ["$userOrders.quantity", "$orderProducts.price"]
                                    }
                                  }
                                }
                              }
                            ]'/>
        <nestedKey name="users_orders" script="result"/>
    </generate>

    <!-- Clear collections after generation -->
    <generate name="delete_users" source="mongodb" selector="find: 'more_users', filter: {}" target="mongodb.delete"/>
    <generate name="delete_orders" source="mongodb" selector="find: 'more_orders', filter: {}" target="mongodb.delete"/>
    <generate name="delete_products" source="mongodb" selector="find: 'more_products', filter: {}" target="mongodb.delete"/>
</setup>

Example 4: Generating Data with Kafka

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
<setup >
    <kafka-exporter id="kafkaLocal" environment="environment"/>
    <kafka-importer id="kafka_importer" system="kafkaLocal" enable.auto.commit="True" auto.offset.reset="earliest" group.id="datamimic" decoding="UTF-8" environment="environment"/>

    <!-- Reset Kafka topic by consuming all messages -->
    <generate name="reset" source="kafka_importer" type="kafka" count="100" target=""/>

    <!-- Generate data to export to Kafka and Console -->
    <generate name="exported_data" count="10" target="ConsoleExporter, kafkaLocal">
        <variable name="person" entity="Person"/>
        <key name="name" script="person.name"/>
        <key name="email" script="person.email"/>
    </generate>

    <!-- Import data from Kafka -->
    <generate name="imported_data" source="kafka_importer" type="kafka" count="20"  distribution="ordered"/>
</setup>

Example 5: Using Data from a CSV File

1
2
3
4
<setup defaultSeparator="|">
    <generate name="product1" source="data/products.ent.csv" separator=","  distribution="ordered"/>
    <generate name="product2" source="data/products_2.ent.csv"  distribution="ordered"/>
</setup>

In this example:

  • Two generate tasks are created that source data from CSV files and output it to the ConsoleExporter.

Example 6: Using cyclic with Data from Memory Store

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
<setup >
    <memstore id="mem"/>
    <generate name="product" count="15" target="mem">
        <key name="id" generator="IncrementGenerator"/>
        <key name="name" values="'Alice', 'Bob', 'Cameron'"/>
    </generate>

    <!-- Generate 30 non-cyclic and cyclic products from memory -->
    <generate name="non-cyclic-product" type="product" count="30" cyclic="False" source="mem" target="" distribution="ordered"/>
    <generate name="cyclic-product" type="product" count="30" cyclic="True" source="mem" target="" distribution="ordered"/>
    <generate name="big-cyclic-product" type="product" count="100" cyclic="True" source="mem" target="" distribution="ordered"/>
</setup>

Example 7: Using 'sourceScripted' with JSON template

1
2
3
4
5
6
7
<setup>
    <generate name="json_data" source="script/data.json" sourceScripted="True" target="">
        <variable name="random_age" generator="IntegerGenerator(min=18, max=65)"/>
        <variable name="street_name" generator="StreetNameGenerator"/>
        <variable name="address_number" generator="IntegerGenerator"/>
    </generate>
</setup>
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
[
    {
        "id": 1,
        "name": "Alice",
        "age": "{random_age}",
        "address": "__address_number__, __street_name__ St"
    },
    {
        "id": 2,
        "name": "Bob",
        "age": "{random_age}",
        "address": "__address_number__, __street_name__ St"
    },
    {
        "id": 3,
        "name": "Cameron",
        "age": "{random_age}",
        "address": "__address_number__, __street_name__ St"
    }
]

Result:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
[
    {
        "id": 1,
        "name": "Alice",
        "age": 23,
        "address": "801538, Walnut Street St"
    },
    {
        "id": 2,
        "name": "Bob",
        "age": 51,
        "address": "680286, View Street St"
    },
    {
        "id": 3,
        "name": "Cameron",
        "age": 29,
        "address": "711086, Forest Street St"
    }
]

In this example:

  • The sourceScripted="True" attribute is used to evaluate the JSON template with embedded variables.
  • The JSON template contains placeholders for variables like random_age, street_name, and address_number.
  • If whole JSON field value is a variable, it should be enclosed in curly braces {} (e.g., "age": "{random_age}"). Returned value can be a string, integer, or any other type.
  • If a variable is embedded within a string, it should be enclosed in double underscores __ (e.g., "address": "__address_number__, __street_name__ St"). Returned value will be a string. You can also customize the prefix and suffix for variable substitution using variablePrefix, variableSuffix, defaultVariablePrefix, and defaultVariableSuffix attributes. For example:
  •  1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    <setup defaultVariablePrefix="-%" defaultVariableSuffix="%-">
        <generate name="json_data" source="script/data.json" sourceScripted="True" target="">
            <variable name="random_age" generator="IntegerGenerator(min=18, max=65)"/>
        </generate>
    </setup>
    <setup>
        <generate name="json_data" source="script/data.json" sourceScripted="True" target="" variablePrefix="-%" variableSuffix="%-">
            <variable name="random_age" generator="IntegerGenerator(min=18, max=65)"/>
        </generate>
    </setup>
    

The <generate> element defines a data generation task. At its most basic, it requires:

  • name: Identifies the generation task
  • count: Specifies how many records to generate
  • target: (Optional) Specifies the output format (e.g., CSV, JSON)

Basic Example

1
2
3
4
5
6
7
<setup>
    <generate name="simple_users" count="10" target="CSV">
        <key name="id" generator="IncrementGenerator"/>
        <key name="name" type="string"/>
        <key name="age" type="int"/>
    </generate>
</setup>

Essential Attributes

  • name: Task identifier
  • count: Number of records to generate
  • target: Output format (e.g., CSV, JSON, ConsoleExporter)
  • source: (Optional) Input data source

<key>

The <key> element defines key fields within a data generation task and specifies their generation methods. These fields are crucial for creating unique identifiers or structured elements within the generated data. The <key> element allows for dynamic, constant, or conditional data generation and provides several attributes to customize its behavior.

Attributes:

  • name: Specifies the name of the key. This is mandatory and will be used as the field name in the generated data.
  • type: Defines the data type of the key (e.g., string, int, bool). This is optional when using script or generator.
  • source: Specifies the data source for the key (e.g., a database, a file).
  • separator: Specifies a separator for csv source.
  • values: Provides a list of static values for the key to choose from.
  • script: Defines a script for dynamically generating the key's value.
  • generator: Specifies a generator to automatically create values (e.g., RandomNumberGenerator, IncrementGenerator).
  • constant: Defines a constant value for the key.
  • condition: Specifies a condition to determine whether the key will be generated.
  • converter: Specifies a converter to transform the value (e.g., date conversion, format changes).
  • pattern: Defines a regex pattern to validate the value of the key.
  • inDateFormat / outDateFormat: Specifies input and output date formats for converting date values.
  • defaultValue: Provides a default value if the key’s value is null or not generated.
  • nullQuota: Defines the probability that the key will be assigned a null value. Default is 0 (never null).
  • database: Specifies the database used for generating values (e.g., SequenceTableGenerator).
  • string: Attribute to generate complex strings by embedding variables within the string using customizable delimiters. (read more in variable section)
  • variablePrefix: Configurable attribute that defines the prefix for variable substitution in dynamic strings (default is __).
  • variableSuffix: Configurable attribute that defines the suffix for variable substitution in dynamic strings (default is __).

Example 1: Generating Constant and Scripted Keys

1
2
3
4
5
6
<setup>
    <generate name="static_and_scripted_keys" count="5" >
        <key name="static_key" constant="fixed_value"/>
        <key name="dynamic_key" script="random.randint(1, 100)"/>
    </generate>
</setup>

In this example:

  • static_key is assigned a constant value of "fixed_value" for every record.
  • dynamic_key generates a random integer between 1 and 100 for each record using a script.

Example 2: Handling nullQuota for Nullable Fields

1
2
3
4
5
6
7
<setup>
    <generate name="nullable_keys" count="10">
        <key name="key_always_null" type="string" nullQuota="1"/> <!-- 100% null values -->
        <key name="key_never_null" type="string" nullQuota="0"/> <!-- 0% null values -->
        <key name="key_sometimes_null" type="string" nullQuota="0.5"/> <!-- 50% null values -->
    </generate>
</setup>

In this example:

  • key_always_null will always have a null value (nullQuota="1").
  • key_never_null will never have a null value (nullQuota="0").
  • key_sometimes_null will have a null value 50% of the time (nullQuota="0.5").

Example 3: Using defaultValue for Fallback Values

1
2
3
4
5
6
7
<setup>
    <generate name="default_values" count="5">
        <key name="key_with_empty_string" script="" defaultValue="default_value"/> <!-- Fallback to default_value -->
        <key name="key_with_none" script="None" defaultValue="default_value"/> <!-- Fallback to default_value -->
        <key name="key_with_condition" script="" defaultValue="default_value" condition="False"/> <!-- Condition False, no generation -->
    </generate>
</setup>

Here:

  • The first two keys fall back to their defaultValue when the script generates an empty or None value.
  • The third key doesn’t generate any value since its condition is False.

Example 4: Conditional Key Generation

1
2
3
4
5
6
<setup>
    <generate name="conditional_keys" count="10">
        <key name="conditional_key" script="random.randint(1, 100)" condition="random.randint(1, 100) > 50"/>
        <key name="constant_key" constant="fixed_value" condition="True"/>
    </generate>
</setup>

In this example:

  • conditional_key is generated only when a random number greater than 50 is produced by the condition script.
  • constant_key is always generated since its condition="True".

Example 5: Using pattern to Validate Keys

1
2
3
4
5
6
<setup>
    <generate name="pattern_matching" count="10">
        <key name="email" script="'[email protected]'" pattern="^[\w\.-]+@[\w\.-]+\.\w+$"/>
        <key name="phone_number" script="'123-456-7890'" pattern="^\d{3}-\d{3}-\d{4}$"/>
    </generate>
</setup>

In this example:

  • The email key’s value must match the regex pattern for a valid email format.
  • The phone_number key’s value must match the regex pattern for a valid phone number format (123-456-7890).

Example 6: Date Conversion Using inDateFormat and outDateFormat

1
2
3
4
5
<setup>
    <generate name="date_format_conversion" count="10">
        <key name="date_of_birth" script="'2023-10-12'" inDateFormat="%Y-%m-%d" outDateFormat="%d-%m-%Y"/>
    </generate>
</setup>

In this example:

  • The date_of_birth key uses the input date format (inDateFormat="%Y-%m-%d") to parse the date and converts it to the specified output format (outDateFormat="%d-%m-%Y").

Example 7: Key Generation from a SequenceTableGenerator

1
2
3
4
5
6
<setup>
    <database id="sourceDB" system="postgres"/>
    <generate name="sequence_key_generation" count="10" >
        <key name="user_id" database="sourceDB" generator="SequenceTableGenerator"/>
    </generate>
</setup>

Here:

  • The user_id key is generated using a SequenceTableGenerator from a PostgreSQL database.
  • This generator ensures that unique, sequential values are pulled from the database.

Best Practices for Using <key>

  1. Leverage script for Dynamic Values: Use script to generate complex and dynamic values, such as random numbers, dates, or values based on calculations.
  2. Use nullQuota for Realistic Data: Use nullQuota to simulate real-world scenarios where some keys may have null values.
  3. Fallback with defaultValue: Use defaultValue to ensure that your keys always have a fallback value if a script fails or produces None.
  4. Pattern Matching for Validation: Use the pattern attribute to enforce specific formatting rules, such as email addresses or phone numbers.
  5. Control Key Generation with condition: Use the condition attribute to dynamically determine whether a key should be generated, allowing for more control in complex data generation scenarios.

<variable>

The <variable> element defines variables used in data generation tasks. Variables can be sourced from databases, datasets, or dynamically generated using scripts. They introduce flexibility in creating dynamic test data by controlling how the data is retrieved or iterated.

Attributes:

  • name: Specifies the name of the variable.
  • type: Defines the data type of the variable (optional).
  • source: Specifies the data source for the variable (e.g., a database or a file).
  • selector: Defines a query to retrieve data for the variable from a database (executed once).
  • iterationSelector: Executes a query on each iteration to retrieve dynamic data for the variable.
  • separator: Specifies a separator for the variable (e.g., for CSV sources).
  • cyclic: Enables or disables cyclic iteration of the data source.
  • entity: Defines the entity for generating data (e.g., a predefined model or object).
  • script: Specifies a script for dynamically generating the variable's value.
  • weightColumn: Specifies a column to weight data selection (typically used in CSV or database sources).
  • sourceScripted: Determines if the source is scripted.
  • generator: Defines a generator for the variable (e.g., RandomNumberGenerator, IncrementGenerator).
  • dataset: Specifies the dataset for the variable (usually a file path).
  • locale: Defines the locale used when generating data.
  • inDateFormat / outDateFormat: Specifies date format conversion for input and output.
  • converter: Defines a converter for transforming the variable's value.
  • constant: Sets a fixed constant value for the variable.
  • values: Provides a list of values for the variable to choose from.
  • defaultValue: Sets a default value when no data is available.
  • pattern: Defines a regex pattern for validating the variable's content.
  • distribution: Controls how data is distributed when selecting from a source (random, ordered).
  • database: Specifies the database used for generating data.
  • string: Attribute to generate complex strings by embedding variables within the string using customizable delimiters.
  • variablePrefix: Configurable attribute that defines the prefix for variable substitution in dynamic strings (default is __).
  • variableSuffix: Configurable attribute that defines the suffix for variable substitution in dynamic strings (default is __).

Example 1: Using generator for Incrementing Values

1
2
3
4
5
6
<setup>
    <generate name="sequential_ids" count="10" >
        <variable name="id" generator="IncrementGenerator"/>
        <key name="generated_id" script="id"/>
    </generate>
</setup>

In this example:

  • The id variable uses the IncrementGenerator, which generates sequential numbers.
  • The generated ID is then assigned to the generated_id key for each record.

Example 2: Sourcing Data from a CSV File with separator

1
2
3
4
5
6
7
8
<setup>
    <generate name="person_data" count="5" >
        <variable name="person" source="data/people.csv" separator="," distribution="ordered"/>
        <key name="person_id" script="person.id"/>
        <key name="person_name" script="person.name"/>
        <key name="person_age" script="person.age"/>
    </generate>
</setup>

In this example:

  • The person variable is sourced from a CSV file, with fields separated by a comma.
  • The distribution="ordered" ensures that records are processed in the order they appear in the file.

Example 3: Defining a constant Variable

1
2
3
4
5
6
<setup>
    <generate name="constant_value_example" count="3" >
        <variable name="country" constant="Germany"/>
        <key name="user_country" script="country"/>
    </generate>
</setup>

In this case:

  • The country variable is defined as a constant with the value "Germany".
  • This value is applied to every record generated in the user_country key.

Example 4: Generating Dynamic Variables with script

1
2
3
4
5
6
7
8
<setup>
    <generate name="dynamic_variables" count="5" >
        <variable name="random_number" script="random.randint(1, 100)"/>
        <variable name="full_name" script="fake.name()"/>
        <key name="random_number_value" script="random_number"/>
        <key name="full_name_value" script="full_name"/>
    </generate>
</setup>

In this example:

  • The random_number variable generates a random integer between 1 and 100 using a script.
  • The full_name variable uses the fake library to generate random names.
  • These dynamically generated values are then printed for each record.

Example 5: Using cyclic Variables with a CSV Source

1
2
3
4
5
6
7
<setup>
    <generate name="cyclic_people" count="8" >
        <variable name="person" source="data/people.csv" cyclic="True" separator=","/>
        <key name="person_id" script="person.id"/>
        <key name="person_name" script="person.name"/>
    </generate>
</setup>

In this example:

  • The cyclic="True" attribute ensures that once all records from the CSV file are used, the data starts from the beginning again.

Example 6: Using distribution to Randomize Data Selection

1
2
3
4
5
6
7
<setup>
    <generate name="random_people" count="10" >
        <variable name="person" source="data/people.csv" separator="," distribution="random"/>
        <key name="person_id" script="person.id"/>
        <key name="person_name" script="person.name"/>
    </generate>
</setup>

Here: - The distribution="random" attribute ensures that the records are selected randomly from the source CSV file.

Example 7: Iterating with iterationSelector

1
2
3
4
5
6
7
8
9
<setup>
    <generate name="iterate_selector" count="20" >
        <key name="iteration_count" generator="IncrementGenerator"/>
        <variable name="user" source="dbPostgres"
                  iterationSelector="SELECT id, name FROM users WHERE id = __iteration_count__"/>
        <key name="user_id" script="user[0].id"/>
        <key name="user_name" script="user[0].name"/>
    </generate>
</setup>

In this example:

  • The iterationSelector query retrieves data from a PostgreSQL database for each iteration using the iteration_count value, dynamically fetching user information.

Example 8: Defining Weighted Variables with weightColumn

1
2
3
4
5
6
7
<setup>
    <generate name="weighted_people" count="10" >
        <variable name="people" source="data/people_weighted.csv" weightColumn="weight" separator=","/>
        <key name="person_id" script="people.id"/>
        <key name="person_name" script="people.name"/>
    </generate>
</setup>

Here:

  • The weightColumn="weight" controls how frequently each row is selected. Rows with higher weight values are more likely to be chosen.

Example 9: Combining Variables with Nested Keys

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
<setup>
    <generate name="customer_info" count="10" >
        <variable name="customer" source="data/customers.csv" cyclic="True"/>
        <variable name="notification" source="data/notifications.csv" cyclic="True"/>
        <key name="customer_id" script="customer.id"/>
        <key name="customer_name" script="customer.name"/>
        <nestedKey name="notifications" type="list" count="2">
            <key name="notification_type" script="notification.type"/>
            <key name="notification_message" script="notification.message"/>
        </nestedKey>
    </generate>
</setup>

In this case:

  • The customer and notification variables are both sourced from CSV files.
  • The nestedKey element generates two notifications for each customer, showcasing how variables can be combined with nested structures.

Example 10: Working with Entities and Locale-Specific Data

1
2
3
4
5
6
7
<setup>
    <generate name="localized_data" count="5" >
        <variable name="person" entity="Person" locale="de_DE"/>
        <key name="person_name" script="person.full_name"/>
        <key name="person_address" script="person.address"/>
    </generate>
</setup>

In this example:

  • The person variable is generated using the Person entity, with data localized to de_DE (Germany).
  • This can be used to generate locale-specific data like names, addresses, etc.

Example 11: Using string Attribute for dynamic and complex strings

1
2
3
4
5
6
<setup defaultVariablePreffix="%%" defaultVariableSuffix="%%">
    <generate name="query_generation" count="1">
        <variable name="collection" constant="'users'" />
        <key name="query" string="find: %%collection%%, filter: {'status': 'active'}" />
    </generate>
</setup>

In this example:

  • The string attribute allows dynamic insertion of the variable collection into the query.
  • The custom %% prefix and suffix replace the default __.

Example 12: Default variablePrefix and variableSuffix

1
2
3
4
5
6
<setup>
    <generate name="query_generation" count="1">
        <variable name="collection" constant="'users'" />
        <key name="query" string="find: __collection__, filter: {'status': 'active'}" />
    </generate>
</setup>

In this case: - The default __ delimiters are used for variable substitution.

Key Benefits of the string Attribute

  • Simplicity: Embedding variables directly within the string eliminates the need for manual string concatenation or escaping.
  • Readability: Dynamic strings are easier to read and maintain.
  • Flexibility: The variablePrefix and variableSuffix attributes allow customization of the delimiters used, providing more flexibility when working with different syntaxes or conventions.

Best Practices for Using <variable>

  1. Dynamic Data Generation: Use scripts in variables to create dynamic data like random numbers, names, and addresses using libraries like random and fake.
  2. Cyclic vs Non-Cyclic: Use cyclic variables when you want data to repeat once all values are used, while non-cyclic variables are exhausted after one pass.
  3. Weighting and Randomization: Use weightColumn to skew data generation toward certain records and distribution="random" to randomize data selection.
  4. Combining with Nested Keys: Use variables in combination with nestedKey to generate structured, hierarchical data.

Complete Basic Example

Here's a complete example combining the core elements:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
<setup>
    <generate name="user_data" count="100" target="CSV">
        <!-- Define variables -->
        <variable name="person" entity="Person"/>

        <!-- Define keys -->
        <key name="id" generator="IncrementGenerator"/>
        <key name="first_name" script="person.given_name"/>
        <key name="last_name" script="person.family_name"/>
        <key name="age" type="int" generator="IntegerGenerator(min=18, max=80)"/>
        <key name="status" constant="active"/>
    </generate>
</setup>

This will generate 100 user records with consistent, structured data including IDs, names, ages, and a status field.

Next Steps

Once you're comfortable with these core elements, explore the Advanced Data Definition Elements for more complex features like:

  • Nested data structures
  • Conditional generation
  • Complex data patterns
  • Arrays and lists
  • Advanced variable usage