First Steps¶
Warning
The current page still doesn't have a translation for this language.
But you can help translating it: Contributing.
In DATAMIMIC, every project begins with an XML-based main model. These models can be auto-generated from connected databases, JSON, XML, or other file types.
The simplest DATAMIMIC model could look like this:
1 2 3 4 5 |
|
1 2 3 4 5 6 7 8 9 10 11 |
|
Let's dive into the model by starting a demo project in DATAMIMIC UI.
Example DATAMIMIC model (Basic Script)¶
In this example, we'll start with the Basic Script
demo to illustrate how DATAMIMIC's model works and which files may be part of a basic DATAMIMIC project.
- Login, Click Clone in the tile 'Basic Script' of the Demo Store to create a DATAMIMIC project.
The Editor view contains the Project Bar
at the top, the FileTree
on the left and the Editor
on the right. The Editor switches dynamically between various modes, depending on the Files you select in the FileTree
.
Models::datamimic
1 2 3 4 5 6 7 |
|
In this DATAMIMIC model, we showcase a configuration that leverages advanced features for data generation and management.
-
<setup>
: The<setup>
node is the root of any DATAMIMIC model and encompasses various configurations for data generation and handling. It allows for advanced configurations to meet specific project requirements. -
<include uri="conf/base.properties"/>
: This directive includes an external configuration file named "base.properties". This feature enables you to centralize and reuse configuration settings across multiple DATAMIMIC models, enhancing modularity and maintainability. You see the referenced file in the Config section of the FileTree. -
<memstore id="mem"/>
: The<memstore>
node creates a memory store with the identifier "mem." Memory stores are used to temporarily store and manipulate data during the data generation process, providing flexibility in data transformations and calculations. -
<include uri="1_prepare.xml"/>
and<include uri="2_generate.xml"/>
: These directives include further models, "1_prepare.xml" and "2_generate.xml," respectively. This modular approach allows you to split your data generation logic into separate files for better organization and reusability.
Let us now look into the model 1_prepare
:
Models::1_prepare | |
---|---|
1 2 3 |
|
In this DATAMIMIC model, we showcase a configuration that leverages the iterative data generation capabilities for handling data from an external source.
<iterate name="CARS" source="data/car.ent.csv" target="mem"/>
: Within the<setup>
node, the<iterate>
node is defined with the name "CARS." This iteration configuration specifies the following:name="CARS"
: Assigns a name to the iteration, allowing you to refer to it within the model.source="data/car.ent.csv"
: Specifies the source of data for the iteration, which is an entity file located at Entity Datacar
.target="mem"
: Indicates that the data generated during this iteration should be stored in memory ("mem"). This memory storage allows for further manipulation and processing of the data within the DATAMIMIC model.
This example demonstrates how DATAMIMIC can efficiently iterate through external data sources and stores its data in-memory for further usage.
We move further and look into the next model 2_generate
:
Models::2_generate | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
<generate name="customer" count="{customer_count}" target="">
: Within the<setup>
node, the<generate>
node is defined to create a dataset named "customer." The key features of this generation configuration are as follows:name="customer"
: Assigns a name to the generated dataset, allowing you to reference it within the model.count="{customer_count}"
: Specifies the number of customer records to be generated. The value is parameterized from thebase
file in section Config, allowing flexibility in the count.-
target=""
: The target formats for the generated data are left empty, indicating that the data will be generated without a specific format defined in this example. However, a limited subset is always created for the DATAMIMIC UIPreview
. -
<variable name="cars_data" type="CARS" source="mem" cyclic="True"/>
: This variable definition named "cars_data" is linked to the type "CARS," indicating that it retrieves data from the memory store ("mem") that has been enriched during execution in the previous1_prepare
. The "cyclic" attribute is set to "True," suggesting that data from the CARS set (only 3 items are in the Entity Filecar
) may be reused. Otherwise DATAMIMIC would not be able to create more than 3 customer records. -
<variable name="person" entity="Person(min_age=21, max_age=67, female_quota=0.5)"/>
: The "person" variable is associated with the "Person" entity, specifying the generation of individual profiles with age constraints and a female quota. -
<variable name="company" entity="Company"/>
: The "company" variable is linked to the "Company" entity, for generating company-related data. -
<key>
: Multiple<key>
elements define attributes of each customer record. For example:- "bool" is generated using the "BooleanGenerator."
- "tc_creation" is generated using the "IntegerGenerator" with a maximum value constraint.
- "car" is scripted to extract car brand data from "cars_data.brand."
- "first_name" and "last_name" are scripted using person-related data.
- "birthDate" is scripted and converted to a specific date format.
- "superuser" has predefined values "True" and "False."
- "email" is scripted based on company data.
- "active" is sourced from an external CSV file with a specified separator.
This example showcases combining variables, entities, generators, and file based data sources.
- Start processing and creating the data by hitting the
Generate
button in the project bar. - Click 'Previews' to see a preview once the processing has been completed.
- Or click 'Logs' to get detailed insights about the task, its processing speed, throughput's and more.
- Or navigate to 'Tasks' to get an overview of all task executions and its status of your project. In 'Task' view you have various controls over the running or past task executions.
- On the 'Task' view you can furthermore click the 'Artifact' icon and select the generated file(s) for download.
Recap¶
- Know about the XML based DATAMIMIC model with its most important nodes
<setup\>
,<generate\>
,<variable\>
and<key\>
. - Checkout and use a project from demo store.
- Use the views
Editor
,Jobs
,Tasks
andPreview
. - Experience the DATAMIMIC file types
Model
,Entity Data
,Weighting Data
andConfig
.