Skip to content

Data Definition Model - Advanced Elements

This document covers advanced features and elements in DATAMIMIC's data definition models. Make sure you're familiar with the Core Data Definition Elements before diving into these advanced features.

Complex Data Structures

<nestedKey>

The <nestedKey> element defines nested key fields and their generation methods within a data generation task. It allows you to structure complex data in a hierarchical format, such as dictionaries (dict) or lists (list), and control its content dynamically.

See the detailed reference.

<reference>

The <reference> element allows referencing other generated data.

Attributes

  • name: Specifies the name of the reference.
  • source: Specifies the source of the reference.
  • sourceType: Specifies the type of the source.
  • sourceKey: Specifies the key of the source.
  • unique: Ensures the reference is unique.

<list>

The <list> element defines a collection of data items, where each item can contain its own attributes, keys, and arrays. Lists are useful for representing structured data, such as rows in a table, or collections of objects with shared attributes. The <list> can contain multiple <item> elements that represent individual entries.

See the detailed list element reference.

<item>

See the detailed item element reference.

<array>

The <array> element defines arrays of data items that can be either statically defined or generated dynamically using scripts. Arrays are essential when you need to generate multiple items of the same data type, such as lists of values, and can be combined with other elements like <list>, <key>, and <nestedKey> to create complex data structures.

See the detailed array element reference.

Control Elements

<condition>

The <condition> element is used to execute a set of child tags (<if>, <else-if>, and <else>) based on specific logical conditions. It provides a way to control the data generation process by applying conditions that determine which elements will be included in the output.

Structure

  • The <if> tag is always the first child of a <condition> element and defines the primary condition.
  • Zero or more <else-if> tags can follow the <if>, each specifying additional conditions to evaluate if the previous conditions are not met.
  • The <else> tag is optional and provides a fallback action if none of the preceding conditions are met.

Rules

  • A <condition> element must have one <if> tag.
  • There can be zero or more <else-if> tags.
  • Only one <else> tag is allowed, and it must appear as the last child of the <condition> element.

Children

  • if: Defines the primary condition to evaluate.
  • else-if: Defines additional conditions to check if the previous conditions are false.
  • else: Fallback action if none of the conditions are met.

Attributes

  • condition: A Python-like expression that evaluates to True or False. The result determines whether the content inside the corresponding tag will be executed.

Example 1: Simple Condition with If-Else Logic

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<condition>
    <if condition="category_id == 'DRNK/ALCO'">
        <echo>Category: DRNK/ALCO</echo>
    </if>
    <else-if condition="category_id == 'FOOD/CONF'">
        <echo>Category: FOOD/CONF</echo>
    </else-if>
    <else>
        <echo>Category not found</echo>
    </else>
</condition>

In this example, based on the value of category_id, the appropriate message is printed.

  • If category_id is 'DRNK/ALCO', the first <if> block is executed.
  • If category_id is 'FOOD/CONF', the <else-if> block is executed.
  • If neither condition is met, the <else> block is executed.

Example 2: Complex Condition with Nested Keys and Default Values

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
<setup>
    <generate name="group_name_not_override" count="10">
        <variable name="ifVar" generator="BooleanGenerator"/>
        <variable name="elseIfVar" generator="BooleanGenerator"/>

        <condition>
            <if condition="ifVar">
                <key name="if_true" constant="true"/> <!-- Generated if ifVar = True -->
            </if>
            <else-if condition="elseIfVar">
                <key name="else_if_true" constant="true"/> <!-- Generated if ifVar = False and elseIfVar = True -->
            </else-if>
            <else>
                <key name="else_true" constant="true"/> <!-- Generated if ifVar = False and elseIfVar = False -->
            </else>
        </condition>
    </generate>
</setup>

In this example, two variables (ifVar and elseIfVar) are generated using a Boolean generator, and depending on their values, one of the <key> elements (if_true, else_if_true, or else_true) will be generated:

  • If ifVar is True, if_true is generated.
  • If ifVar is False and elseIfVar is True, else_if_true is generated.
  • If both conditions are False, else_true is generated.

Example 3: Conditions with Nested Structures

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<setup>
    <generate name="bike" count="10">
        <key name="id" type="int" generator="IncrementGenerator"/>
        <key name="year" type="int" values="1970, 2023"/>

        <!-- Conditional nested key generation -->
        <nestedKey name="condition_true" type="dict" condition="True">
            <key name="serial" type="int" condition="id % 2 == 1"/>
            <key name="count" type="int" generator="IncrementGenerator"/>
        </nestedKey>

        <condition>
            <if condition="True">
                <nestedKey name="if_true" type="dict">
                    <key name="id" type="int"/>
                </nestedKey>
            </if>
            <else-if condition="False">
                <nestedKey name="if_false" type="dict">
                    <key name="id" type="int"/>
                </nestedKey>
            </else-if>
        </condition>
    </generate>
</setup>

In this example:

  • The condition_true nested key block will always be executed because its condition is True.
  • The conditional block within <condition> executes the <if> block because its condition is True, while the <else-if> is ignored as it evaluates to False.

Example 4: Conditions with Default Values

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<setup>
    <generate name="condition" count="10">
        <key name="id" generator="IncrementGenerator"/>

        <condition>
            <if condition="id == 1">
                <echo>Condition met: id is 1</echo>
                <key name="if_true" constant="1"/>
            </if>
            <else-if condition="id == 3">
                <echo>Condition met: id is 3</echo>
                <key name="else_if_3_true" constant="3"/>
            </else-if>
            <else-if condition="id == 4">
                <echo>Condition met: id is 4</echo>
                <key name="else_if_4_true" constant="4"/>
            </else-if>
            <else>
                <echo>Condition not met, proceeding with default values</echo>
                <key name="else_true" constant="else_value"/>
            </else>
        </condition>
    </generate>
</setup>

Here, the conditional logic checks the value of id:

  • If id is 1, the if_true key is generated.
  • If id is 3 or 4, the appropriate else_if block is executed.
  • If no conditions match, the else_true key is generated with a fallback value.

Example 5: Conditional Removal of Elements

1
2
3
4
5
<setup>
    <generate name="group_name_not_override" count="10">
        <key name="removeElement" script="" defaultValue="{}" condition="False"/>
    </generate>
</setup>

In this case, the removeElement key will not be generated because the condition is set to False.

Example 6: Using Default Values in Conditional NestedKeys

1
2
3
4
5
6
7
<setup>
    <generate name="bike" count="10">
        <nestedKey name="condition_false" type="dict" condition="False" defaultValue="None">
            <key name="serial" type="int" condition="id % 2 == 1"/>
        </nestedKey>
    </generate>
</setup>
  • Since the condition is False, the condition_false nested key will take the defaultValue of None.

Best Practices for Using <condition>

  1. Use Conditions to Control Output: Conditions are an excellent way to control the generation of data elements dynamically based on the current state of variables.

  2. Fallbacks with Default Values: Use default values when you want to provide a fallback if a condition evaluates to False.

  3. Combine with Nested Structures: You can use conditions with nested keys, lists, and arrays to build complex logic-driven data models.

  4. Use else-if for Multiple Conditions: To handle multiple possible states, use a combination of <if>, <else-if>, and <else> to cover all scenarios.

<echo>

The <echo> element outputs text, variables, or expressions for logging, debugging, or monitoring purposes. This can be helpful in tracking the progress of data generation or inspecting the values of variables during runtime. It accepts dynamic content, including variables and expressions enclosed in {}.

Attributes

  • text: The static or dynamic content to be printed. Dynamic values are wrapped in {}, allowing you to output variables, expressions, or results of functions during execution.

Usage

  • Use <echo> to print out the values of variables or track the flow of the setup process.
  • It can output text to the console or log files, depending on the target defined in the setup.

Example 1: Basic Usage for Debugging

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<setup>
    <variable name="user" source="dbPostgres" cyclic="False"
              selector="SELECT id, text FROM public.db_postgres_test_query_setup_context_variable"
              distribution="ordered"/>

    <variable name="all_users" source="dbPostgres" cyclic="False"
              iterationSelector="SELECT id, text FROM public.db_postgres_test_query_setup_context_variable"/>

    <echo>user is a DotableDict: {user}</echo>
    <echo>all_users is a list of dict: {all_users}</echo>
</setup>

In this example:

  • The <echo> tag prints the current value of the user variable, which is a DotableDict.
  • It also prints the all_users variable, which is a list of dictionaries retrieved from the database.

Example 2: Using Echo for Debugging Scripted Variables

1
2
3
4
5
6
7
<setup>
    <variable name="random_number" generator="RandomNumberGenerator(min=1, max=100)"/>
    <echo>Generated random number: {random_number}</echo>

    <key name="status" script="random_number > 50 ? 'High' : 'Low'"/>
    <echo>The status based on random_number is: {status}</echo>
</setup>

In this example:

  • A random number is generated and echoed for debugging.
  • The status is then calculated based on the random number, and the result is printed using <echo>.

Best Practices for Using <echo>

  1. Debug Complex Logic: Use <echo> to debug variables and complex expressions, especially when using scripts or database sources to generate data dynamically.
  2. Monitor Data Generation: Track the progress of your data generation by echoing values at key points in your setup.
  3. Combine with Variables: You can use variables and expressions within the text of <echo> to output dynamic content during the generation process.

<generator>

The <generator> element specifies custom generators for data generation.

Attributes

  • name: Specifies the name of the generator.
  • generator: Specifies your custom generator.

Example

1
2
3
4
5
6
7
8
<setup>
    <generator name="my_custom_date_gen" generator="DateTimeGenerator(min='2010-08-01', max='2020-08-31', input_format='%Y-%m-%d')"/>
    <generate name="product">
        <key name="product_name" type="string"/>
        <key name="import_date" generator="my_custom_date_gen"/>
        <key name="export_date" generator="my_custom_date_gen"/>
    </generate>
</setup>

<element>

The <element> element specifies child elements within an XML node, making it useful for generating nested XML structures. The name of the <element> tag becomes an XML attribute, and the generated content becomes the attribute's value.

Attributes

  • name: Specifies the name of the child XML element or attribute. This is mandatory.
  • script: Specifies a script to dynamically generate the value of the element.
  • constant: A constant value for the element, if no dynamic generation is required.
  • values: A list of possible values to randomly select from, if applicable.

Example 1: Simple XML Generation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
<setup>
    <generate name="generate_xml" count="2" target="XML">
        <variable name="person" entity="Person"/>
        <key name="author" script="None">
            <element name="name" script="person.name"/>
            <element name="gender" script="person.gender"/>
            <element name="birthdate" script="person.birthdate"/>
        </key>
    </generate>
</setup>

In this example:

  • The author key creates an XML node, and each <element> within defines attributes (such as name, gender, birthdate) populated with the values generated from the person entity.

Example 2: NestedKey with Elements and Attributes

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
<setup>
    <generate name="part_list" count="1" target="XML">
        <nestedKey name="book" type="list" count="4">
            <element name="title" values="'Book 1', 'Book 2', 'Book 3', 'Book 4'"/>
            <element name="language" values="'de', 'en'"/>
            <element name="pages" generator="IntegerGenerator(min=100,max=800)"/>
            <element name="release_date" generator="DateTimeGenerator(min='2020-01-01', max='2023-12-31', input_format='%Y-%m-%d')"/>
        </nestedKey>
    </generate>
</setup>

Here:

  • The book list generates multiple XML elements (title, language, pages, release_date), each containing attributes for various books.
  • The content for each element is dynamically generated based on values or generators.

Example 3: Generating XML with Nested Lists and Dictionaries

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
<setup>
    <generate name="part_dict" count="1" target="XML">
        <nestedKey name="book" type="dict">
            <key name="title" values="'Book 1', 'Book 2'">
                <element name="language" values="'de', 'en'"/>
            </key>
            <key name="pages" generator="IntegerGenerator(min=100,max=800)"/>
            <key name="release_date" generator="DateTimeGenerator(min='2020-01-01', max='2023-12-31', input_format='%Y-%m-%d')"/>
        </nestedKey>
        <nestedKey name="magazine" type="dict">
            <key name="title" values="'Magazine #1', 'Magazine #2'"/>
            <key name="language" values="'de', 'en'"/>
            <key name="pages" generator="IntegerGenerator(min=30,max=70)"/>
        </nestedKey>
    </generate>
</setup>

In this case:

  • Both books and magazines are generated as XML elements with dynamically generated attributes such as language, pages, and release_date.

Example 4: Generating XML with Arrays and Lists

1
2
3
4
5
6
<setup>
    <generate name="array_xml" count="2" target="XML">
        <array name="random_string" type="string" count="3"/>
        <array name="random_number" type="int" count="3"/>
    </generate>
</setup>

In this example:

  • Two arrays (random_string and random_number) are generated as XML elements, with each array containing 3 items.
  • This demonstrates how arrays can be incorporated into the XML generation process.

Example 5: Complex XML with Lists and NestedKeys

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
<setup>
    <generate name="list_xml" count="1" target="XML">
        <list name="detail">
            <item>
                <key name="number" type="int" constant="2"/>
            </item>
            <item>
                <key name="text" type="string"/>
            </item>
            <item>
                <nestedKey name="employees" type="list" count="2">
                    <key name="code" type="string"/>
                    <key name="age" values="25, 30, 28, 45"/>
                </nestedKey>
            </item>
        </list>
    </generate>
</setup>

This example shows how to generate a list of items, where each item can contain additional nested keys and sub-elements, demonstrating a hierarchical structure in XML.

Example 6: Generating XML from a Template

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
<setup>
    <generate name="product" source="data/user.template.xml" target="ConsoleExporter, XML">
        <nestedKey name="xga:dgu.gewerbemeldung.0230">
            <element name="new_key" constant="new_key_value"/>
            <nestedKey name="bn-g2g:nachrichtenkopf.g2g">
                <element name="new_key2" constant="new_key_value2"/>
                <key name="bn-g2g:identifikation.nachricht" constant="abc"/>
            </nestedKey>
        </nestedKey>
        <key name="additionalTag" constant="extra">
            <element name="new_key" constant="new_key_value"/>
            <element name="new_key1" constant="new_key_value1"/>
        </key>
        <list name="array">
            <item>
                <key name="additionalTag" constant="extra">
                    <element name="new_key" constant="new_key_value"/>
                </key>
            </item>
        </list>
    </generate>
</setup>

In this case:

  • The XML generation is based on a predefined template (data/user.template.xml), and the structure is dynamically extended using nestedKey, key, and list elements.
  • This demonstrates how to augment existing XML structures with new elements and attributes.

Best Practices for Using <element>

  1. Use <element> to Create Structured XML: The <element> tag is a flexible way to build structured XML documents where each element and its attributes can be dynamically generated or statically defined.
  2. Combine <element> with Other Elements: Use the <element> tag in combination with <list>, <array>, and <nestedKey> to create complex XML structures.
  3. Leverage Dynamic Scripts: Take advantage of the script attribute to dynamically generate values for the elements based on complex logic or external variables.
  4. Use constant for Fixed Values: For cases where an element's value should not change, use the constant attribute.

Data control elements

<sourceConstraints>

The <sourceConstraints> is used to filter out unwanted data from source based on rule set. Applied at the beginning to filter input data. It is a powerful way to ensure that only data matching specific rules is used from your input source. When used correctly, it improves the accuracy, quality, and reliability of the generated output.

Structure:

  • It is used with <rule> child tag.
  • The <rule> elements are children of <sourceConstraints>
  • <sourceConstraints> follow by <rule if="condition" then="action"> syntax.

Rules

  • <sourceConstraints> ONLY supports as direct child tag of <generate> or <nestedKey>.
  • One <generate> or <nestedKey> element can only have one <sourceConstraints> element.
  • A <sourceConstraints> can have many <rule>.
  • <rule> must have attributes if and then.
  • It's a good practice to place <sourceConstraints> element as first subtag of <generate>.

Children

  • rule: Defines the rule to accept source data.

Attributes

  • <sourceConstraints> don't have any attribute.
  • <rule> have if and then attribute.
  • if: A Python-like expression that evaluates to True or False. The result determines whether execute the rule check.
  • then: A Python-like expression that evaluates to True or False. When result equal to True then keep the data, otherwise remove the data.

Example 1: Simple sourceConstraints for generate element

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<setup>
    <generate name="synthetic_customers" count="10000" pageSize="1000"
              source="script/person_data.json" cyclic="True">
        <sourceConstraints>
          <rule if="credit_score &lt; 600" then="risk_profile == 'High'"/>
          <rule if="credit_score &gt;= 600 and credit_score &lt; 750" then="risk_profile == 'Medium'"/>
          <rule if="credit_score &gt;= 750" then="risk_profile == 'Low'"/>
        </sourceConstraints>
        <key name="id" generator="IncrementGenerator"/>
    </generate>
</setup>

Structure of person_data file's data:

1
2
3
4
5
6
7
8
{
  "firstname": "Charlie",
  "lastname": "Brown",
  "age": 61,
  "city": "New York",
  "credit_score": 707,
  "risk_profile": "Low"
}

In this example:

  • The XML get data from source "script/person_data.json" to generate data.
  • <sourceConstraints> will filter that source data before generate.
  • The first <rule> trigger when credit_score field in source data is lesser than 600. If data in field risk_profile is "High" then keep the data, otherwise remove it.
  • The second <rule> trigger when credit_score field in source data is greater and equal than 600, and lesser and equal than 750. Remove data when risk_profile is not "Medium".
  • The third <rule> trigger when credit_score field in source data is greater and equal than 750. Remove data when risk_profile is not "Low".
  • In the end the input data from source will satisfy all the rule that place in sourceConstraints.

Example 2: Constraints in cascade structure

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
<setup>
    <generate name="container" count="1">
        <generate name="synthetic_customers" count="10000" pageSize="1000"
                  source="script/person_data.json" cyclic="True">
            <sourceConstraints>
              <rule if="credit_score &lt; 600" then="risk_profile == 'High'"/>
              <rule if="credit_score &gt;= 600 and credit_score &lt; 750" then="risk_profile == 'Medium'"/>
              <rule if="credit_score &gt;= 750" then="risk_profile == 'Low'"/>
            </sourceConstraints>
            <key name="id" generator="IncrementGenerator"/>
        </generate>
    </generate>
</setup>

In this case:

  • The sourceConstraints affects only its direct parent element: <generate> name "synthetic_customers"

Example 3: Constraints for generate element

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
<setup>
    <generate name="container" count="1">
        <nestedKey name="cyclic_true" source="script/person_data.json"
                   type="list" count="1000" cyclic="True">
            <sourceConstraints>
              <rule if="credit_score &lt; 600" then="risk_profile == 'High'"/>
              <rule if="credit_score &gt;= 600 and credit_score &lt; 750" then="risk_profile == 'Medium'"/>
              <rule if="credit_score &gt;= 750" then="risk_profile == 'Low'"/>
            </sourceConstraints>
            <key name="id" generator="IncrementGenerator"/>
        </nestedKey>
    </generate>
</setup>

In this case:

  • The sourceConstraints affects only its direct parent element: <nestedKey> name "cyclic_true"
  • This structure is useful for filtering data within deeply nested lists or structures.

Best Practices for Using <sourceConstraints>

  1. Filter Unwanted Data Early: Use <sourceConstraints> to ensure only relevant or valid data is used from the source file. This helps reduce errors and improves data quality during generation.

<targetConstraints>

The <targetConstraints> can be used to validate final values based on rule set. Applied at the end to validate and finalize output data. The <targetConstraints> element applied at the end to validate and finalize output data. It validates final generated data, and add additional attributes based on generated values

Usage

<targetConstraints> usage is similar to <sourceConstraints>. But it's effect the output generated data.

Example 1: targetConstraints for generate element

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
<setup>
    <generate name="container" count="1">
        <generate name="synthetic_customers" count="1000" pageSize="100"
            source="script/person_data.json" cyclic="True">
            <key name="id" generator="IncrementGenerator" />

            <!-- Target Constraints: Final validation -->
            <targetConstraints>
                <rule if="credit_limit &gt;= 25000 and interest_rate &lt;= 0.08" then="approval_status = 'Approved'" />
                <rule if="credit_limit &lt; 25000 or interest_rate &gt; 0.08" then="approval_status = 'Review'" />
                <rule if="credit_limit &lt;= 5000 and interest_rate &gt;= 0.12" then="approval_status = 'Denied'" />
            </targetConstraints>
        </generate>
    </generate>
</setup>

In this case:

  • The rule of <targetConstraints> will affect the generated data, change attribute or add (if not exist) so that the output match then
  • The output of approval_status is always Approved if credit_limit >= 25000 and interest_rate <= 0.08. Also applied the same as others rules

<mapping>

The <mapping> transformations data's value based on rule set. Applied during processing to transform selected records. The <mapping> element is applied during processing to transform selected records

Usage

  • <mapping> usage is similar to <sourceConstraints>. But it's used to change the data value.
  • When the attribute met the if requirement, <mapping> will transform attribute in then.

Example 1: mapping for generate element

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
<setup>
    <generate name="container" count="1">
        <generate name="synthetic_customers" count="1000" pageSize="100"
            source="script/person_data.json" cyclic="True">
            <!-- Mapping: Transform attributes based on source constraints -->
            <mapping>
                <rule if="risk_profile == 'High'" then="interest_rate = 0.15"/>
                <rule if="risk_profile == 'Medium'" then="interest_rate = 0.10"/>
                <rule if="risk_profile == 'Low'" then="interest_rate = 0.05"/>
                <!-- Additional rules for credit limits -->
                <rule if="income &gt; 100000" then="credit_limit = 50000" />
                <rule if="income &gt; 50000 and income &lt;= 100000" then="credit_limit = 25000" />
                <rule if="income &gt; 30000 and income &lt;= 50000" then="credit_limit = 10000" />
                <rule if="income &lt;= 30000" then="credit_limit = 5000" />
            </mapping>
        </generate>
    </generate>
</setup>

In this case:

  • When risk_profile is High, <mapping> add new attribute interest_rate with value 0.15