Data Definition Model - Advanced Elements¶
This document covers advanced features and elements in DATAMIMIC's data definition models. Make sure you're familiar with the Core Data Definition Elements before diving into these advanced features.
Complex Data Structures¶
<nestedKey>¶
The <nestedKey> element defines nested key fields and their generation methods within a data generation task. It allows you to structure complex data in a hierarchical format, such as dictionaries (dict) or lists (list), and control its content dynamically.
See the detailed
<reference>¶
The <reference> element allows referencing other generated data.
Attributes¶
name: Specifies the name of the reference.source: Specifies the source of the reference.sourceType: Specifies the type of the source.sourceKey: Specifies the key of the source.unique: Ensures the reference is unique.
<list>¶
The <list> element defines a collection of data items, where each item can contain its own attributes, keys, and arrays. Lists are useful for representing structured data, such as rows in a table, or collections of objects with shared attributes. The <list> can contain multiple <item> elements that represent individual entries.
See the detailed list element reference.
<item>¶
See the detailed item element reference.
<array>¶
The <array> element defines arrays of data items that can be either statically defined or generated dynamically using scripts. Arrays are essential when you need to generate multiple items of the same data type, such as lists of values, and can be combined with other elements like <list>, <key>, and <nestedKey> to create complex data structures.
See the detailed array element reference.
Control Elements¶
<condition>¶
The <condition> element is used to execute a set of child tags (<if>, <else-if>, and <else>) based on specific logical conditions. It provides a way to control the data generation process by applying conditions that determine which elements will be included in the output.
Structure¶
- The
<if>tag is always the first child of a<condition>element and defines the primary condition. - Zero or more
<else-if>tags can follow the<if>, each specifying additional conditions to evaluate if the previous conditions are not met. - The
<else>tag is optional and provides a fallback action if none of the preceding conditions are met.
Rules¶
- A
<condition>element must have one<if>tag. - There can be zero or more
<else-if>tags. - Only one
<else>tag is allowed, and it must appear as the last child of the<condition>element.
Children¶
if: Defines the primary condition to evaluate.else-if: Defines additional conditions to check if the previous conditions are false.else: Fallback action if none of the conditions are met.
Attributes¶
condition: A Python-like expression that evaluates toTrueorFalse. The result determines whether the content inside the corresponding tag will be executed.
Example 1: Simple Condition with If-Else Logic¶
1 2 3 4 5 6 7 8 9 10 11 | |
In this example, based on the value of category_id, the appropriate message is printed.
- If
category_idis'DRNK/ALCO', the first<if>block is executed. - If
category_idis'FOOD/CONF', the<else-if>block is executed. - If neither condition is met, the
<else>block is executed.
Example 2: Complex Condition with Nested Keys and Default Values¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
In this example, two variables (ifVar and elseIfVar) are generated using a Boolean generator, and depending on their values, one of the <key> elements (if_true, else_if_true, or else_true) will be generated:
- If
ifVarisTrue,if_trueis generated. - If
ifVarisFalseandelseIfVarisTrue,else_if_trueis generated. - If both conditions are
False,else_trueis generated.
Example 3: Conditions with Nested Structures¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | |
In this example:
- The
condition_truenested key block will always be executed because its condition isTrue. - The conditional block within
<condition>executes the<if>block because its condition isTrue, while the<else-if>is ignored as it evaluates toFalse.
Example 4: Conditions with Default Values¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | |
Here, the conditional logic checks the value of id:
- If
idis 1, theif_truekey is generated. - If
idis 3 or 4, the appropriateelse_ifblock is executed. - If no conditions match, the
else_truekey is generated with a fallback value.
Example 5: Conditional Removal of Elements¶
1 2 3 4 5 | |
In this case, the removeElement key will not be generated because the condition is set to False.
Example 6: Using Default Values in Conditional NestedKeys¶
1 2 3 4 5 6 7 | |
- Since the
conditionisFalse, thecondition_falsenested key will take thedefaultValueofNone.
Best Practices for Using <condition>¶
-
Use Conditions to Control Output: Conditions are an excellent way to control the generation of data elements dynamically based on the current state of variables.
-
Fallbacks with Default Values: Use default values when you want to provide a fallback if a condition evaluates to
False. -
Combine with Nested Structures: You can use conditions with nested keys, lists, and arrays to build complex logic-driven data models.
-
Use
else-iffor Multiple Conditions: To handle multiple possible states, use a combination of<if>,<else-if>, and<else>to cover all scenarios.
<echo>¶
The <echo> element outputs text, variables, or expressions for logging, debugging, or monitoring purposes. This can be helpful in tracking the progress of data generation or inspecting the values of variables during runtime. It accepts dynamic content, including variables and expressions enclosed in {}.
Attributes¶
- text: The static or dynamic content to be printed. Dynamic values are wrapped in
{}, allowing you to output variables, expressions, or results of functions during execution.
Usage¶
- Use
<echo>to print out the values of variables or track the flow of the setup process. - It can output text to the console or log files, depending on the target defined in the setup.
Example 1: Basic Usage for Debugging¶
1 2 3 4 5 6 7 8 9 10 11 | |
In this example:
- The
<echo>tag prints the current value of theuservariable, which is aDotableDict. - It also prints the
all_usersvariable, which is a list of dictionaries retrieved from the database.
Example 2: Using Echo for Debugging Scripted Variables¶
1 2 3 4 5 6 7 | |
In this example:
- A random number is generated and echoed for debugging.
- The status is then calculated based on the random number, and the result is printed using
<echo>.
Best Practices for Using <echo>¶
- Debug Complex Logic: Use
<echo>to debug variables and complex expressions, especially when using scripts or database sources to generate data dynamically. - Monitor Data Generation: Track the progress of your data generation by echoing values at key points in your setup.
- Combine with Variables: You can use variables and expressions within the text of
<echo>to output dynamic content during the generation process.
<generator>¶
The <generator> element specifies custom generators for data generation.
Attributes¶
name: Specifies the name of the generator.generator: Specifies your custom generator.
Example¶
1 2 3 4 5 6 7 8 | |
<element>¶
The <element> element specifies child elements within an XML node, making it useful for generating nested XML structures. The name of the <element> tag becomes an XML attribute, and the generated content becomes the attribute's value.
Attributes¶
name: Specifies the name of the child XML element or attribute. This is mandatory.script: Specifies a script to dynamically generate the value of the element.constant: A constant value for the element, if no dynamic generation is required.values: A list of possible values to randomly select from, if applicable.
Example 1: Simple XML Generation¶
1 2 3 4 5 6 7 8 9 10 | |
In this example:
- The
authorkey creates an XML node, and each<element>within defines attributes (such asname,gender,birthdate) populated with the values generated from thepersonentity.
Example 2: NestedKey with Elements and Attributes¶
1 2 3 4 5 6 7 8 9 10 | |
Here:
- The
booklist generates multiple XML elements (title,language,pages,release_date), each containing attributes for various books. - The content for each element is dynamically generated based on values or generators.
Example 3: Generating XML with Nested Lists and Dictionaries¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
In this case:
- Both books and magazines are generated as XML elements with dynamically generated attributes such as
language,pages, andrelease_date.
Example 4: Generating XML with Arrays and Lists¶
1 2 3 4 5 6 | |
In this example:
- Two arrays (
random_stringandrandom_number) are generated as XML elements, with each array containing 3 items. - This demonstrates how arrays can be incorporated into the XML generation process.
Example 5: Complex XML with Lists and NestedKeys¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
This example shows how to generate a list of items, where each item can contain additional nested keys and sub-elements, demonstrating a hierarchical structure in XML.
Example 6: Generating XML from a Template¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
In this case:
- The XML generation is based on a predefined template (
data/user.template.xml), and the structure is dynamically extended usingnestedKey,key, andlistelements. - This demonstrates how to augment existing XML structures with new elements and attributes.
Best Practices for Using <element>¶
- Use
<element>to Create Structured XML: The<element>tag is a flexible way to build structured XML documents where each element and its attributes can be dynamically generated or statically defined. - Combine
<element>with Other Elements: Use the<element>tag in combination with<list>,<array>, and<nestedKey>to create complex XML structures. - Leverage Dynamic Scripts: Take advantage of the
scriptattribute to dynamically generate values for the elements based on complex logic or external variables. - Use
constantfor Fixed Values: For cases where an element's value should not change, use theconstantattribute.
Data control elements¶
<sourceConstraints>¶
The <sourceConstraints> element validates and filters input source data based on specified conditions. It is applied at the beginning of the data generation process to ensure that only data meeting the required criteria is processed from the source file. This helps maintain data quality and prevents invalid or unwanted data from entering the generation pipeline.
Syntax¶
1 | |
Rules¶
<sourceConstraints>ONLY supports as direct child tag of<generate>or<nestedKey>.- Multiple
<sourceConstraints>elements can be used within the same parent element. - It's a good practice to place
<sourceConstraints>element as first subtag of<generate>/<nestedKey>.
Attributes¶
if: A Python-like expression that evaluates toTrueorFalse. The result determines whether the source constraint applies.require: Expression that must evaluate toTrueto keep the source data. IfFalse, the source data is filtered out.
Example 1: Simple sourceConstraints for generate element¶
1 2 3 4 5 6 7 8 9 | |
Structure of person_data file's data:
1 2 3 4 5 6 7 8 | |
In this example:
- The XML gets data from source "script/person_data.json" to generate data.
<sourceConstraints>will filter that source data before generation.- The first
<sourceConstraints>applies whencredit_scorefield in source data is less than 600. It keeps data only ifrisk_profileis "High". - The second
<sourceConstraints>applies whencredit_scoreis between 600 and 750. It keeps data only ifrisk_profileis "Medium". - The third
<sourceConstraints>applies whencredit_scoreis 750 or higher. It keeps data only ifrisk_profileis "Low". - In the end, the input data from source will satisfy all the constraints that are placed in
<sourceConstraints>.
Example 2: Constraints in cascade structure¶
1 2 3 4 5 6 7 8 9 10 | |
In this case:
- The
<sourceConstraints>affects only its direct parent element:<generate>name "synthetic_customers"
Example 3: Constraints for nestedKey element¶
1 2 3 4 5 6 7 8 9 10 11 | |
In this case:
- The
<sourceConstraints>affects only its direct parent element:<nestedKey>name "cyclic_true" - This structure is useful for filtering data within deeply nested lists or structures.
Best Practices for Using <sourceConstraints>¶
-
Filter Unwanted Data Early: Use
<sourceConstraints>to ensure only relevant or valid data is used from the source file. This helps reduce errors and improves data quality during generation. -
Use Multiple Constraints: You can use multiple
<sourceConstraints>elements to apply different filtering conditions based on various criteria. -
Place Constraints First: It's recommended to place
<sourceConstraints>as the first child element of<generate>or<nestedKey>for optimal performance.
<targetConstraints>¶
The <targetConstraints> element filters out records that don't meet the required conditions. Applied at the end to validate and filter output data.
Syntax¶
1 | |
Parameters¶
if: Boolean condition that determines when the target constraint appliesrequire: Expression that must evaluate toTrueto keep the generated record. IfFalse, the record is filtered out from the final output
Usage Examples¶
Example 1: Basic Filtering¶
1 2 3 4 5 6 7 8 9 10 11 12 | |
Example 2: Complex Filtering Conditions¶
1 2 3 4 5 | |
Example 3: Simple Boolean Check¶
1 2 3 4 5 6 7 8 9 10 | |
In this case:
- Records with
id <= 50are only kept iftargetConstraints == True - Since
targetConstraintsis set toTrueonly whenname == 'HARRY', only Harry records withid <= 50are kept - Mark records with
id <= 50are filtered out becausetargetConstraintsremainsFalse
<mapping>¶
The <mapping> element transforms data values based on rule set. Applied during processing to transform selected records.
Syntax¶
1 | |
Parameters¶
if: Boolean condition that determines when the mapping rule appliesset: Expression that can be:- Assignment expression (
key = value) - Boolean expression (
True/False) - Any valid Python expression
Usage Examples¶
Example 1: Basic Value Assignment¶
1 2 3 | |
Example 2: Conditional Calculations¶
1 2 3 4 | |
Example 3: Mathematical Expressions¶
1 2 | |
Example 4: Sequential Mapping (Dependencies)¶
1 2 | |
Example 5: Default Value Assignment¶
1 | |
Example 6: Filtering with Boolean Results¶
1 | |
Complete Example¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
In this case:
- When
risk_profileisHigh,<mapping>adds new attributeinterest_ratewith value0.15 - When
incomeis greater than 100000,<mapping>setscredit_limitto 50000 - Sequential mapping allows later rules to reference values set by earlier rules
Execution Order¶
The constraint system executes in the following order:
- Source Constraints - Validate and filter input source data before processing begins
- Mapping - Transform data values if conditions are met during generation
- Target Constraints - Validate and filter generated output data before final export
Each layer operates on different stages of the data pipeline and can build upon the results of previous layers.
Key Features¶
Expression Support¶
All three constraint elements support:
- Boolean expressions (True, False, comparisons)
- Assignment expressions (key = value)
- Mathematical calculations (income * 0.4)
- String operations and concatenation
- Complex conditional logic with and, or, not
- Function calls and method invocations
Sequential Processing¶
- Source constraints rules are processed sequentially
- Mapping rules are processed sequentially
- Later rules can reference values set by earlier rules
- Target constraints can reference all previously set values
Common Behavior¶
- All three elements check the
ifcondition first - Only when
ifcondition isTrue, the action (set/require) is executed - Source constraints validate and filter input source data before processing
- Mapping transforms data values during the generation process
- Target constraints validate and filter generated output data before final export
Error Handling¶
- Invalid expressions are logged as warnings by default
- Set
ERROR_CONFIG.fail_fast = Trueto raise exceptions on errors - Type conversion is automatically handled for numeric strings
Performance Considerations¶
- Expressions are evaluated using Python's
eval()function with restricted globals - Complex expressions may impact performance on large datasets
- Consider using simpler conditions for better performance
Best Practices¶
- Use descriptive variable names for better readability
- Order mapping rules logically - simple conditions first, complex ones later
- Test expressions thoroughly before deploying to production
- Use target constraints sparingly for final validation only
- Document complex business rules with comments in XML
- Validate data types when performing mathematical operations
- Consider performance impact of complex expressions on large datasets