Data Definition Model - Core Elements¶
Data Definition Models are fundamental to DATAMIMIC's test data generation capabilities. This document covers the essential elements - if you're new to DATAMIMIC, start here. For advanced features, see Advanced Data Definition Elements.
Overview¶
Data Definition Models specify how test data should be generated, transformed, or obfuscated. The core elements allow you to:
- Define data generation tasks
- Specify key fields and their values
- Create and use variables
- Generate structured data sets
Expression Syntax: {expr} vs {{ expr }}¶
DATAMIMIC supports two forms of expression syntax with different caching behavior:
Syntax Comparison¶
| Syntax | Behavior | Caching |
|---|---|---|
{expr} |
CACHED - Evaluated once per record iteration, then cached | Faster for repeated references |
{{ expr }} |
DYNAMIC - Evaluated every time it's referenced | Fresh value each time |
When to Use Each Form¶
Use {expr} (cached) when:
- The expression result should be consistent within a single record
- You reference the same expression multiple times
- Performance matters for complex expressions
Use {{ expr }} (dynamic) when:
- You need fresh values each time (e.g., timestamps with precision)
- Random values should be unique per usage
- The expression has side effects
Examples¶
Timestamp Behavior¶
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Random Value Behavior¶
1 2 3 4 5 6 7 8 9 10 11 | |
Recommendation Table¶
| Use Case | Recommended | Reason |
|---|---|---|
| Static/deterministic expressions | {expr} |
Better performance |
| Time-sensitive (timestamps) | {{ expr }} |
If precision matters between fields |
| Random values needing uniqueness | {{ expr }} |
Fresh value each evaluation |
| Complex expressions referenced multiple times | {expr} |
Consistency + performance |
| Expressions with side effects | {{ expr }} |
Ensure side effects execute |
Note on targetEntity¶
For targetEntity in single-file exporters, {expr} and {{ expr }} behave identically because the expression cache is reset for each record and the value is evaluated once per record. See Dynamic targetEntity for details.
Setup-Time vs Runtime Attributes¶
Some attributes are evaluated at setup time (before generation starts), while others support runtime evaluation (during each record iteration). This distinction affects which expression syntax you can use:
| Attribute | Evaluation Time | {expr} |
{{ expr }} |
|---|---|---|---|
source |
Setup | Allowed | Not allowed |
target |
Setup | Allowed | Not allowed |
uri (execute) |
Setup | Allowed | Not allowed |
sourceUri |
Setup | Allowed | Not allowed |
exportUri |
Setup | Allowed | Not allowed |
targetEntity |
Runtime | Allowed | Allowed |
count |
Runtime | Allowed | Allowed |
selector |
Runtime | Allowed | N/A |
Setup-time attributes require a resolved value before generation begins. Using {{expr}} (DYNAMIC) for these attributes raises error I870.
Example: Dynamic Source Path¶
1 2 3 4 5 6 7 | |
Example: Dynamic Target¶
1 2 3 4 5 6 7 | |
Warning
Inline interpolation (e.g., source="data/{version}/file.csv") is not supported. The entire attribute value must be wrapped in {expr}:
1 2 3 4 5 6 | |
Basic Elements¶
¶
The <setup> element is the root element for all data generation tasks. It contains one or more <generate> elements that define specific data generation operations. Learn more of its use in Configuration Models.
1 2 3 4 5 | |
<generate>¶
The <generate> element is the core of Data Definition Models. It defines a data generation task and includes attributes like name, count, and target. This element is used to create structured data based on the specified configurations.
Note
For nested <generate> blocks, variable lookup is scope-sensitive. If a variable is declared inside the current nested generate, prefer this.variableName instead of relying on an unqualified name. See Variable Scoping in Nested Generates.
Attributes¶
- name: Specifies the name of the generation task.
- count: Specifies the number of records to generate.
- source: Specifies the source of the data (e.g.,
data/active.ent.csv,mongo). - target: Specifies the target output (e.g.,
CSV,sqliteDB). - type: Specifies the type of data to generate.
- cyclic: Enables or disables cyclic generation. Default is
False. - selector: Specifies a database query for the generation.
- For top-level
<generate>, selector behavior belongs to the datasource/loader read path, not to the variable setup-cache contract. - separator: Specifies a separator for the generated data. Default is
|. - sourceScripted: Enables or disables scripted source evaluation in the source file (e.g.,
example.ent.csv,example.json). Default isFalse. - pageSize: Specifies the page size for data generation.
- For top-level selector reads,
pageSizeaffects generate/source paging and downstream batching as implemented by the loader path. - It is not a universal selector semantic for all selector forms in the DSL.
- storageId: Specifies the object-storage client ID, defined by the
<object-storage>element. Applies to file exporters (CSV/JSON/XML/Template) and is not the same astargetClient. - sourceUri: Specifies the URI of the datasource on object storage (e.g.,
datasource/employees.csv). - exportUri: Specifies only the path/prefix for exporting generated data (e.g.,
export/). The filename is derived fromnameortargetEntity, and the extension is determined by the exporter. - container: Specifies the container name for Azure Blob Storage.
- bucket: Specifies the bucket name for AWS S3.
- distribution: Specifies the distribution of data source iteration (e.g.,
random,ordered). Default israndom. - converter: Specifies a converter to transform value.
- variablePrefix: Configurable attribute that defines the prefix for variable substitution in dynamic strings (default is __).
- variableSuffix: Configurable attribute that defines the suffix for variable substitution in dynamic strings (default is __).
- numProcess: Defines the number of processes for multiprocessing, can be propagated from parent element
<setup>. Default is 1. - mpPlatform: Define multiprocessing platform to be executed. Accepted values are
multiprocessingandray. Default value ismultiprocessing.
Overview (Target vs Storage vs Source)¶
| Goal | Attribute(s) | Notes |
|---|---|---|
| Choose exporter type | target |
e.g., CSV, JSON, XML, Template, mongodb |
| Override DB/Kafka/Warehouse client | targetClient |
Client-based exporters only; not object storage |
| Select object-storage client for file exports | storageId |
References <object-storage id="..."> |
| Set export path/prefix | exportUri |
Path only; filename from name/targetEntity, extension from exporter |
| Read from object storage | source + sourceUri |
source = client id; sourceUri = object key |
| Set bucket/container | bucket / container |
Optional; defaults from client config |
Children¶
<key>: Specifies key fields within the data generation task.<variable>: Defines variables used in data generation.<reference>: Defines references to other generated data.<nestedKey>: Specifies nested key fields and their generation methods.<list>: Defines lists of data items.<condition>: Conditional element to include data based on certain conditions.<array>: Defines arrays of data items.<echo>: Outputs text or variables for logging or debugging purposes.
Example 1: Using Object Storage for Data Generation¶
1 2 3 4 5 6 7 8 9 10 11 | |
Object storage reads (source + sourceUri)¶
1 2 3 4 5 6 7 | |
Example 2: Using selector with a Database¶
1 2 3 4 | |
In this example:
- The
selectoris used to query the MongoDB database to find all customers under 30 years old. - The data is output to the
ConsoleExporter.
Selector contract notes¶
For top-level <generate> statements:
| Form | Execution timing | Cache scope | pageSize effect |
Notes |
|---|---|---|---|---|
selector="SQL or find: ..." |
Generate read path | No setup-cache contract | Loader-owned paging/export batching | Do not infer variable-style cache-once behavior |
selector="aggregate: ..." |
Generate read path | No setup-cache contract | Backend-specific; do not assume find-style paging |
Mongo aggregate is its own selector kind |
Important notes:
- Root-level
<generate iterationSelector>is not supported. - Parent/global placeholders may be used in selector text when the generate contract allows them. Child-scope placeholders are not visible to a parent selector.
- If you need setup-cached reference data semantics, use a database-backed
<variable selector="...">and not a top-level generate selector.
Example 3: Generating Data with MongoDB and Aggregation¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | |
Example 4: Generating Data with Kafka¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
Example 5: Using Data from a CSV File¶
1 2 3 4 | |
In this example:
- Two
generatetasks are created that source data from CSV files and output it to theConsoleExporter.
Example 6: Using cyclic with Data from Memory Store¶
1 2 3 4 5 6 7 8 9 10 11 12 | |
Example 7: Using 'sourceScripted' with JSON template¶
1 2 3 4 5 6 7 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
Result:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
In this example:
- The
sourceScripted="True"attribute is used to evaluate the JSON template with embedded variables. - The JSON template contains placeholders for variables like
random_age,street_name, andaddress_number. - If whole JSON field value is a variable, it should be enclosed in curly braces
{}(e.g.,"age": "{random_age}"). Returned value can be a string, integer, or any other type. - If a variable is embedded within a string, it should be enclosed in double underscores
__(e.g.,"address": "__address_number__, __street_name__ St"). Returned value will be a string. You can also customize the prefix and suffix for variable substitution usingvariablePrefix,variableSuffix,defaultVariablePrefix, anddefaultVariableSuffixattributes. For example: 1 2 3 4 5 6 7 8 9 10
<setup defaultVariablePrefix="-%" defaultVariableSuffix="%-"> <generate name="json_data" source="script/data.json" sourceScripted="True" target=""> <variable name="random_age" generator="IntegerGenerator(min=18, max=65)"/> </generate> </setup> <setup> <generate name="json_data" source="script/data.json" sourceScripted="True" target="" variablePrefix="-%" variableSuffix="%-"> <variable name="random_age" generator="IntegerGenerator(min=18, max=65)"/> </generate> </setup>
Example 8: Using multiprocessing platform ray¶
1 2 3 4 5 6 7 | |
The <generate> element defines a data generation task. At its most basic, it requires:
- name: Identifies the generation task
- count: Specifies how many records to generate
- target: (Optional) Specifies the output format (e.g., CSV, JSON)
Basic Example¶
1 2 3 4 5 6 7 | |
Essential Attributes¶
- name: Task identifier
- count: Number of records to generate
- target: Output format (e.g.,
CSV,JSON,ConsoleExporter) - source: (Optional) Input data source
<key>¶
The <key> element defines key fields within a data generation task and specifies their generation methods. These fields are crucial for creating unique identifiers or structured elements within the generated data. The <key> element allows for dynamic, constant, or conditional data generation and provides several attributes to customize its behavior.
Attributes¶
- name: Specifies the name of the key. This is mandatory and will be used as the field name in the generated data.
- type: Defines the data type of the key (e.g.,
string,int,bool). This is optional when usingscriptorgenerator. - source: Specifies the data source for the key (e.g., a database, a file).
- separator: Specifies a separator for csv source.
- values: Provides a list of static values for the key to choose from.
- script: Defines a script for dynamically generating the key's value.
- generator: Specifies a generator to automatically create values (e.g.,
RandomNumberGenerator,IncrementGenerator). - constant: Defines a constant value for the key.
- condition: Specifies a condition to determine whether the key will be generated.
- converter: Specifies a converter to transform the value (e.g., date conversion, format changes).
- pattern: Defines a regex pattern to validate the value of the key.
- inDateFormat / outDateFormat: Specifies input and output date formats for converting date values. After
outDateFormatis applied, downstream expressions see a string, not a raw datetime. - defaultValue: Provides a default value if the key’s value is null or not generated.
- nullQuota: Defines the probability that the key will be assigned a null value. Default is
0(never null). - database: Specifies the database used for generating values (e.g.,
SequenceTableGenerator). - string: Attribute to generate complex strings by embedding variables within the string using customizable delimiters. (read more in variable section)
- variablePrefix: Configurable attribute that defines the prefix for variable substitution in dynamic strings (default is
__). - variableSuffix: Configurable attribute that defines the suffix for variable substitution in dynamic strings (default is
__).
Example 1: Generating Constant and Scripted Keys¶
1 2 3 4 5 6 | |
In this example:
static_keyis assigned a constant value of"fixed_value"for every record.dynamic_keygenerates a random integer between 1 and 100 for each record using a script.
Example 2: Handling nullQuota for Nullable Fields¶
1 2 3 4 5 6 7 | |
In this example:
key_always_nullwill always have a null value (nullQuota="1").key_never_nullwill never have a null value (nullQuota="0").key_sometimes_nullwill have a null value 50% of the time (nullQuota="0.5").
Example 3: Using defaultValue for Fallback Values¶
1 2 3 4 5 6 7 | |
Here:
- The first two keys fall back to their
defaultValuewhen the script generates an empty orNonevalue. - The third key doesn’t generate any value since its
conditionisFalse.
Example 4: Conditional Key Generation¶
1 2 3 4 5 6 | |
In this example:
conditional_keyis generated only when a random number greater than 50 is produced by theconditionscript.constant_keyis always generated since itscondition="True".
Example 5: Using pattern to Validate Keys¶
1 2 3 4 5 6 | |
In this example:
- The
emailkey’s value must match the regex pattern for a valid email format. - The
phone_numberkey’s value must match the regex pattern for a valid phone number format (123-456-7890).
Example 6: Date Conversion Using inDateFormat and outDateFormat¶
1 2 3 4 5 | |
In this example:
- The
date_of_birthkey uses the input date format (inDateFormat="%Y-%m-%d") to parse the date and converts it to the specified output format (outDateFormat="%d-%m-%Y").
Example 7: Keep A Raw Datetime For Arithmetic¶
1 2 3 4 5 6 7 | |
In this example:
issue_at_rawstays a raw datetime for downstream arithmetic.issue_dateis formatted for output.due_dateuses the raw datetime value and formats only at the end.
Example 8: Key Generation from a SequenceTableGenerator¶
1 2 3 4 5 6 | |
Here:
- The
user_idkey is generated using aSequenceTableGeneratorfrom a PostgreSQL database. - This generator ensures that unique, sequential values are pulled from the database.
Best Practices for Using <key>¶
- Leverage
scriptfor Dynamic Values: Usescriptto generate complex and dynamic values, such as random numbers, dates, or values based on calculations. - Use
nullQuotafor Realistic Data: UsenullQuotato simulate real-world scenarios where some keys may have null values. - Fallback with
defaultValue: UsedefaultValueto ensure that your keys always have a fallback value if a script fails or producesNone. - Pattern Matching for Validation: Use the
patternattribute to enforce specific formatting rules, such as email addresses or phone numbers. - Control Key Generation with
condition: Use theconditionattribute to dynamically determine whether a key should be generated, allowing for more control in complex data generation scenarios. - Keep Raw Datetimes Separate From Formatted Output: If you need datetime arithmetic later, keep a raw variable or unformatted value and format only on the final key.
<variable>¶
The <variable> element defines variables used in data generation tasks. Variables can be sourced from databases, datasets, or dynamically generated using scripts. They introduce flexibility in creating dynamic test data by controlling how the data is retrieved or iterated. New in this release is the storage attribute for explicit control while keeping full backward compatibility.
Attributes¶
- name: Specifies the name of the variable.
- type: Defines the data type of the variable (optional). For DB/file sources, use the table/collection/entity name.
- source: Specifies the data source for the variable (e.g., a database or a file path).
- selector: Defines a database query for the variable. By default, it executes once and exposes the result through the variable's storage/value behavior.
- iterationSelector: Executes a database query on each iteration to retrieve dynamic data for the variable.
- paged: Optional DB-selector behavior switch. When
paged="true", the selector follows the current parent<generate>page window instead of the default setup-cached once-per-run behavior. - storage: Controls how the variable stores/serves data. Options:
value– single static value (default for generators/constants; first row if a query)data– complete data list in memory (random access)iterator– cursor/iterator over rows; respectscyclic- separator: Specifies a separator for the variable (e.g., for CSV sources).
- cyclic: Enables or disables cyclic iteration of the data source (relevant for
storage="iterator"). - entity: Defines the entity for generating data (e.g., a predefined model or object).
- script: Specifies a script for dynamically generating the variable's value.
- weightColumn: Specifies a column to weight data selection (typically used in CSV or database sources).
- sourceScripted: Enables per-row template evaluation for file-backed sources (CSV/JSON, weighted sources).
- generator: Defines a generator for the variable (e.g.,
RandomNumberGenerator,IncrementGenerator). - dataset: Specifies the dataset for the variable (usually a file path).
- locale: Defines the locale used when generating data.
- inDateFormat / outDateFormat: Specifies date format conversion for input and output. After
outDateFormatis applied, downstream expressions see a string value, not a raw datetime. - converter: Defines a converter for transforming the variable's value.
- constant: Sets a fixed constant value for the variable.
- values: Provides a list of values for the variable to choose from.
- defaultValue: Sets a default value when no data is available.
- pattern: Defines a regex pattern for validating the variable's content.
- distribution: Controls how data is distributed when selecting from a source (
random,ordered). - database: Specifies the database used for generating data.
- string: Attribute to generate complex strings by embedding variables within the string using customizable delimiters (see examples on
<key>). - variablePrefix / variableSuffix: Configurable attributes that define the prefix/suffix for variable substitution in dynamic strings (default is
__). Can be set globally on<setup>viadefaultVariablePrefix/defaultVariableSuffixand overridden per element.
Storage Modes¶
Use storage for explicit, predictable behavior:
value: single static value. Ideal for configuration values,selectorscalar queries, constants, generators.data: full list materialized in memory. Useful for analytics, random access, or joining in scripts. Be mindful of size.iterator: efficient row-by-row iteration. Honorscyclic. Best for large tables/files.
Selector and Paging Contract¶
Selector behavior on <variable> is intentionally split into distinct contracts:
| Form | Execution timing | Cache scope | pageSize effect |
Notes |
|---|---|---|---|---|
selector="..." |
Once by default | Run-local cached result | No generic source-paging contract | Result exposure still depends on storage/value behavior |
selector="..." paged="true" |
Once per parent page window | Page-scoped cache | Follows the current parent generate page window | Use when the variable must track generate paging |
iterationSelector="..." |
Per iteration | No setup-cache contract | No generic pageSize contract |
Dynamic per-iteration lookup |
Important notes:
selectoranditerationSelectoralways drive the query when present;typeandsourceEntitydo not override them.- Parent and global placeholders can be resolved in selector text. Child-scope placeholders are not visible to a parent selector.
- Mongo
aggregate:is a selector-kind exception. Do not assume it shares the same paging semantics as SQL or Mongofind:selectors.
Legacy Automatic Behavior (Backward Compatible)¶
If storage is omitted, DATAMIMIC applies the legacy rules:
- Selector-based variables → behave as static single value.
- Table/collection variables → cycle over rows (iterator semantics).
- Generator/constant variables → single value.
Context Levels¶
- Root-level variables (declared directly under
<setup>): loaded once, shared across the run, stable in multiprocessing. - Nested variables (declared inside
<generate>): created per generation scope; can exhaust whencyclic="False".
Multiprocessing Notes¶
storage="data": each worker receives the same snapshot list.storage="iterator": each worker advances its own cursor. For globally partitioned traversal, partition upstream (e.g., by ID ranges).
Example 1: Using generator for Incrementing Values¶
1 2 3 4 5 6 | |
In this example:
- The
idvariable uses theIncrementGenerator, which generates sequential numbers. - The generated ID is then assigned to the
generated_idkey for each record.
Example 2: Sourcing Data from a CSV File with separator¶
1 2 3 4 5 6 7 8 | |
In this example:
- The
personvariable is sourced from a CSV file, with fields separated by a comma. - The
distribution="ordered"ensures that records are processed in the order they appear in the file.
Example 3: Defining a constant Variable¶
1 2 3 4 5 6 | |
In this case:
- The
countryvariable is defined as a constant with the value "Germany". - This value is applied to every record generated in the
user_countrykey.
Example 4: Generating Dynamic Variables with script¶
1 2 3 4 5 6 7 8 | |
In this example:
- The
random_numbervariable generates a random integer between 1 and 100 using a script. - The
full_namevariable uses thefakelibrary to generate random names. - These dynamically generated values are then printed for each record.
Example 5: Using cyclic Variables with a CSV Source¶
1 2 3 4 5 6 7 | |
In this example:
- The
cyclic="True"attribute ensures that once all records from the CSV file are used, the data starts from the beginning again.
Example 6: Using distribution to Randomize Data Selection¶
1 2 3 4 5 6 7 | |
Here:
- The
distribution="random"attribute ensures that the records are selected randomly from the source CSV file.
Example 7: Iterating with iterationSelector¶
1 2 3 4 5 6 7 8 9 | |
In this example:
- The
iterationSelectorquery retrieves data from a PostgreSQL database for each iteration using theiteration_countvalue, dynamically fetching user information.
Example 8: Using paged="true" with a database selector¶
1 2 3 4 5 6 7 8 9 10 | |
In this example:
- The selector no longer behaves like a setup-cached once-per-run lookup.
- It reloads once for each parent generate page window and reuses that page-local result inside the page.
Example 9: Preserve A Raw Datetime Variable For Arithmetic¶
1 2 3 4 5 6 | |
In this example:
created_atremains a raw datetime variable.- The key performs arithmetic first and formatting second.
Example 10: Defining Weighted Variables with weightColumn¶
1 2 3 4 5 6 7 | |
Here:
- The
weightColumn="weight"controls how frequently each row is selected. Rows with higher weight values are more likely to be chosen.
Example 10: Combining Variables with Nested Keys¶
1 2 3 4 5 6 7 8 9 10 11 12 | |
In this case:
- The
customerandnotificationvariables are both sourced from CSV files. - The
nestedKeyelement generates two notifications for each customer, showcasing how variables can be combined with nested structures.
Example 11: Working with Entities and Locale-Specific Data¶
1 2 3 4 5 6 7 | |
In this example:
- The
personvariable is generated using thePersonentity, with data localized tode_DE(Germany). - This can be used to generate locale-specific data like names, addresses, etc.
Example 12: Using string Attribute for Dynamic and Complex Strings¶
1 2 3 4 5 6 | |
In this example:
- The
stringattribute allows dynamic insertion of the variablecollectioninto the query. - The custom
%%prefix and suffix replace the default__.
Example 13: Default variablePrefix and variableSuffix¶
1 2 3 4 5 6 | |
In this case:
- The default
__delimiters are used for variable substitution.
Example 14: Explicit Iterator Storage with Cycling Control¶
1 2 3 4 5 6 | |
Example 15: Storing the Complete Dataset in Memory¶
1 2 3 4 5 6 7 | |
Example 16: Forcing Single-Value Behavior¶
1 2 3 4 5 6 | |
Best Practices for Using <variable>¶
- Dynamic Data Generation: Use scripts in variables to create dynamic data like random numbers, names, and addresses using libraries like
randomandfake. - Cyclic vs Non-Cyclic: Use
cyclicvariables when you want data to repeat once all values are used, while non-cyclic variables are exhausted after one pass. - Weighting and Randomization: Use
weightColumnto skew data generation toward certain records anddistribution="random"to randomize data selection. - Combining with Nested Keys: Use variables in combination with
nestedKeyto generate structured, hierarchical data. - Pick the right
storage: Usevaluefor scalars/config,iteratorfor large sources,datafor small datasets you need to index. - Prefer explicit
typefor DB sources: Avoid relying on variable names to infer tables/collections. - Mind multiprocessing:
datais shared as a snapshot; iterators advance per worker.
Storage Mode Summary¶
| Variable Pattern | Storage Mode | Behavior | Use Case |
|---|---|---|---|
selector="..." |
auto → value | Single value | Config, max values, constants |
selector="..." paged="true" |
page-aware | Reloads per parent page window | Page-local DB lookups |
type="table_name" |
auto → iterator | Cycles through data | Entity relationships |
storage="value" |
Explicit | Single value | Force static behavior |
storage="data" |
Explicit | Complete list | Calculations, random access |
storage="iterator" |
Explicit | Cycles/exhausts by cyclic |
Controlled iteration |
<ml-train>¶
The <ml-train> element is used to train machine learning models with input data. These trained models can then be used as sources in <generate> elements to enrich original data.
The <ml-train> element is a sub element of <setup>
Attributes¶
- name: Specifies the name of model after trained. This is mandatory and will be used to reference the model in other elements.
- source: Specifies the source of the data (e.g., data/active.ent.csv, mongo).
- type: Specifies the type of data to generate.
- mode: Specifies the training mode. Currently only have 'default' and 'persist'. 'default' will remove the model after all task finished. 'persist' will keep the model after all task finished.
- maxTrainingTime: Specifies the maximum time allowed for model training in minutes (e.g. 1, 5, 10)
- separator: Specifies the separator used in the source data file (e.g., ',' for CSV files).
Example 1: Basic Model Training¶
1 2 3 4 5 6 7 8 9 10 | |
In this example: - The model named "customer_csv_gen" is trained using data from "data/customers.csv" - We didn't specific "mode" so it will be default and "customer_csv_gen" model will be removed after all task finish. - The CSV file uses comma as separator - "generate" will use trained "customer_csv_gen" model as source to create new data
Example 2: Training with persist mode¶
1 2 3 4 5 6 7 8 9 10 11 12 | |
In this example: - We specify "mode" is "persist" so it will keep even after all task finish. - Later we can use it without training.
1 2 3 4 5 | |
For the Database View -> ML Generator View lifecycle and project-level reuse flow (including source="ml://..." usage), see ML Generator from Database Metadata and ML Generator View.
Complete Basic Example¶
Here's a complete example combining the core elements:
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
This will generate 100 user records with consistent, structured data including IDs, names, ages, and a status field.
Next Steps¶
Once you're comfortable with these core elements, explore the Advanced Data Definition Elements for more complex features like:
- Nested data structures
- Conditional generation
- Complex data patterns
- Arrays and lists
- Advanced variable usage