ML Generator from Database Metadata¶

This guide shows the end-to-end flow for creating ML training DSL artifacts from database metadata, executing training, reviewing persisted versions in ML Generator View, and reusing them in DATAMIMIC DSL.

Prerequisites¶

Configure a database environment in Environments.
Run metadata scan for that environment.
Open Database View and ensure your tables/columns are available.

1. Build an ML training artifact in Database View¶

In Database workbench → Planning, select your scope.
Run Plan subset (preset-based closure + relationship validation).
Verify preflight state.
Click Create ML in Generate artifacts.
Set naming options and create the model artifact.
Execute the generated DSL model (with <ml-train> nodes) to start training and persist model versions.

For planning and scope details, see Auto-Generate Model from Database.

Note

Training duration depends on table size, feature complexity, and selected runtime. Larger training workloads can require more CPU/RAM and, depending on environment setup, GPU resources (for example NVIDIA cards).

2. Inspect trained model quality and versions in ML Generator View¶

Open ML-Gen from the top navigation.

Only models with completed/persisted training runs are listed with versions and statistics in this view.

You can:

search and select generators in the left pane,
switch versions in the version list,
set default version,
delete outdated versions,
inspect status, utility/privacy KPIs, and top drift indicators,
open full report, export, and model metadata.

ML Generator View: model quality overview, version selection, and version actions

3. Reuse ML generators in DATAMIMIC DSL¶

After training/persisting models, reference them in <generate> using ml://... as source.

<setup>
    <generate name="accounts_stage" count="1000" pageSize="200" source="ml://accounts_model" target="mapping">
        <key name="account_id" generator="IncrementGenerator(start=1)"/>
    </generate>
</setup>

DATAMIMIC DSL using ML generators — DSL flow: train per-table models and reuse them as ML sources in generate nodes

Operational Notes¶

Use stable subset planning inputs for reproducible training behavior.
Keep one validated default version per generator for downstream DSL use.
Retrain when schema drift or data distribution changes materially.
Review utility/privacy status before promoting a version to default.
Treat generated recommendations and models as baseline output that still requires project-specific tuning.