Skip to content

ML Generator from Database Metadata

This guide shows the end-to-end flow for creating ML training DSL artifacts from database metadata, executing training, reviewing persisted versions in ML Generator View, and reusing them in DATAMIMIC DSL.

Prerequisites

  1. Configure a database environment in Environments.
  2. Run metadata scan for that environment.
  3. Open Database View and ensure your tables/columns are available.

1. Build an ML training artifact in Database View

  1. In Database workbench → Planning, select your scope.
  2. Run Plan subset (preset-based closure + relationship validation).
  3. Verify preflight state.
  4. Click Create ML in Generate artifacts.
  5. Set naming options and create the model artifact.
  6. Execute the generated DSL model (with <ml-train> nodes) to start training and persist model versions.

For planning and scope details, see Auto-Generate Model from Database.

Note

Training duration depends on table size, feature complexity, and selected runtime. Larger training workloads can require more CPU/RAM and, depending on environment setup, GPU resources (for example NVIDIA cards).

2. Inspect trained model quality and versions in ML Generator View

Open ML-Gen from the top navigation.

Only models with completed/persisted training runs are listed with versions and statistics in this view.

You can:

  • search and select generators in the left pane,
  • switch versions in the version list,
  • set default version,
  • delete outdated versions,
  • inspect status, utility/privacy KPIs, and top drift indicators,
  • open full report, export, and model metadata.
ML Generator View
ML Generator View: model quality overview, version selection, and version actions

3. Reuse ML generators in DATAMIMIC DSL

After training/persisting models, reference them in <generate> using ml://... as source.

1
2
3
4
5
<setup>
    <generate name="accounts_stage" count="1000" pageSize="200" source="ml://accounts_model" target="mapping">
        <key name="account_id" generator="IncrementGenerator(start=1)"/>
    </generate>
</setup>
DATAMIMIC DSL using ML generators
DSL flow: train per-table models and reuse them as ML sources in generate nodes

Operational Notes

  • Use stable subset planning inputs for reproducible training behavior.
  • Keep one validated default version per generator for downstream DSL use.
  • Retrain when schema drift or data distribution changes materially.
  • Review utility/privacy status before promoting a version to default.
  • Treat generated recommendations and models as baseline output that still requires project-specific tuning.