Skip to content

First Steps

In DATAMIMIC, every project begins with an XML-based main model. These models can be auto-generated from connected databases, JSON, XML, or other file types.

The simplest DATAMIMIC model could look like this:

1
2
3
4
5
<setup>
  <generate name="output" count="10" target="CSV">
    <key name="counter" generator="IncrementGenerator"/>
  </generate>
</setup>
Creating the following output in a CSV file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
counter
1
2
3
4
5
6
7
8
9
10

Let's dive into the model by starting a demo project in DATAMIMIC UI.

Example DATAMIMIC model (Basic Script)

In this example, we'll start with the Basic Script demo to illustrate how DATAMIMIC's model works and which files may be part of a basic DATAMIMIC project.

  • Login, Click Clone in the tile 'Basic Script' of the Demo Store to create a DATAMIMIC project.

The Editor view contains the Project Bar at the top, the FileTree on the left and the Editor on the right. The Editor switches dynamically between various modes, depending on the Files you select in the FileTree.

Models::datamimic

1
2
3
4
5
6
7
<setup>
    <include uri="conf/base.properties"/>
    <memstore id="mem"/>

    <include uri="1_prepare.xml"/>
    <include uri="2_generate.xml"/>
</setup>

In this DATAMIMIC model, we showcase a configuration that leverages advanced features for data generation and management.

  • <setup>: The <setup> node is the root of any DATAMIMIC model and encompasses various configurations for data generation and handling. It allows for advanced configurations to meet specific project requirements.

  • <include uri="conf/base.properties"/>: This directive includes an external configuration file named "base.properties". This feature enables you to centralize and reuse configuration settings across multiple DATAMIMIC models, enhancing modularity and maintainability. You see the referenced file in the Config section of the FileTree.

  • <memstore id="mem"/>: The <memstore> node creates a memory store with the identifier "mem." Memory stores are used to temporarily store and manipulate data during the data generation process, providing flexibility in data transformations and calculations.

  • <include uri="1_prepare.xml"/> and <include uri="2_generate.xml"/>: These directives include further models, "1_prepare.xml" and "2_generate.xml," respectively. This modular approach allows you to split your data generation logic into separate files for better organization and reusability.

Let us now look into the model 1_prepare:

Models::1_prepare
1
2
3
<setup>
  <iterate name="CARS" source="data/car.ent.csv" target="mem"/>
</setup>

In this DATAMIMIC model, we showcase a configuration that leverages the iterative data generation capabilities for handling data from an external source.

  • <iterate name="CARS" source="data/car.ent.csv" target="mem"/>: Within the <setup> node, the <iterate> node is defined with the name "CARS." This iteration configuration specifies the following:
  • name="CARS": Assigns a name to the iteration, allowing you to refer to it within the model.
  • source="data/car.ent.csv": Specifies the source of data for the iteration, which is an entity file located at Entity Data car.
  • target="mem": Indicates that the data generated during this iteration should be stored in memory ("mem"). This memory storage allows for further manipulation and processing of the data within the DATAMIMIC model.

This example demonstrates how DATAMIMIC can efficiently iterate through external data sources and stores its data in-memory for further usage.

We move further and look into the next model 2_generate:

Models::2_generate
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
<setup>
    <generate name="customer" count="{customer_count}" target="">
        <variable name="cars_data" type="CARS" source="mem" cyclic="True"/>
        <variable name="person" entity="Person(min_age=21, max_age=67, female_quota=0.5)"/>
        <variable name="company" entity="Company"/>
        <key name="bool" generator="BooleanGenerator"/>
        <key name="tc_creation" generator="IntegerGenerator(max=999999999)"/>
        <key name="car" script="cars_data.brand"/>
        <key name="first_name" script="person.given_name"/>
        <key name="last_name" script="person.family_name"/>
        <key name="birthDate" script="person.birthdate" converter="DateFormat('%d.%m.%Y')"/>
        <key name="superuser" values="True, False"/>
        <key name="email" script="'info@' + company.short_name.replace(' ', '-') + str(tc_creation) + '.de'"/>
        <key name="active" source="data/active.wgt.csv" separator="|"/>
    </generate>
</setup>
In this DATAMIMIC model, we illustrate a comprehensive data generation configuration for creating customer records. This configuration leverages various variables, entities, and generators to generate realistic customer data.

  • <generate name="customer" count="{customer_count}" target="">: Within the <setup> node, the <generate> node is defined to create a dataset named "customer." The key features of this generation configuration are as follows:
  • name="customer": Assigns a name to the generated dataset, allowing you to reference it within the model.
  • count="{customer_count}": Specifies the number of customer records to be generated. The value is parameterized from the base file in section Config, allowing flexibility in the count.
  • target="": The target formats for the generated data are left empty, indicating that the data will be generated without a specific format defined in this example. However, a limited subset is always created for the DATAMIMIC UI Preview.

  • <variable name="cars_data" type="CARS" source="mem" cyclic="True"/>: This variable definition named "cars_data" is linked to the type "CARS," indicating that it retrieves data from the memory store ("mem") that has been enriched during execution in the previous 1_prepare. The "cyclic" attribute is set to "True," suggesting that data from the CARS set (only 3 items are in the Entity File car) may be reused. Otherwise DATAMIMIC would not be able to create more than 3 customer records.

  • <variable name="person" entity="Person(min_age=21, max_age=67, female_quota=0.5)"/>: The "person" variable is associated with the "Person" entity, specifying the generation of individual profiles with age constraints and a female quota.

  • <variable name="company" entity="Company"/>: The "company" variable is linked to the "Company" entity, for generating company-related data.

  • <key>: Multiple <key> elements define attributes of each customer record. For example:

    • "bool" is generated using the "BooleanGenerator."
    • "tc_creation" is generated using the "IntegerGenerator" with a maximum value constraint.
    • "car" is scripted to extract car brand data from "cars_data.brand."
    • "first_name" and "last_name" are scripted using person-related data.
    • "birthDate" is scripted and converted to a specific date format.
    • "superuser" has predefined values "True" and "False."
    • "email" is scripted based on company data.
    • "active" is sourced from an external CSV file with a specified separator.

This example showcases combining variables, entities, generators, and file based data sources.

  • Start processing and creating the data by hitting the Generate button in the project bar.
  • Click 'Previews' to see a preview once the processing has been completed.
  • Or click 'Logs' to get detailed insights about the task, its processing speed, throughput's and more.
  • Or navigate to 'Tasks' to get an overview of all task executions and its status of your project. In 'Task' view you have various controls over the running or past task executions.
  • On the 'Task' view you can furthermore click the 'Artifact' icon and select the generated file(s) for download.

Recap

  • Know about the XML based DATAMIMIC model with its most important nodes <setup\>, <generate\>, <variable\> and <key\>.
  • Checkout and use a project from demo store.
  • Use the views Editor, Jobs, Tasks and Preview.
  • Experience the DATAMIMIC file types Model, Entity Data, Weighting Data and Config.