Skip to content

Custom Worker Configuration

DATAMIMIC runs all background work as Celery tasks. Every task type has its own queue, and each worker process consumes a configurable set of queues. This page explains the task/queue model, lists every supported task, and shows how to assign queues to workers so that no task type is left unconsumed.

How task routing works

  • Each task type maps to exactly one queue, named after the task type β€” for example, the data_source_scan task is routed to the data_source_scan queue.
  • A worker only processes the queues listed in its WORKER_QUEUES environment variable: a comma-separated list of queue names, no spaces.
  • A task whose queue is not consumed by any running worker stays enqueued and never runs. Splitting work across workers is therefore a deliberate routing decision β€” every queue you intend to use must appear in at least one running worker's WORKER_QUEUES.

Custom topologies must cover every queue you use

The default deployment ships values that already cover all queues, so it needs no action. If you define your own worker layout, make sure every task type your users can trigger is covered β€” otherwise those tasks never run. Task types added in a release (for example the Workbench scan family data_source_browse, data_source_scan, and scan_result_generate_model in 3.5.0) must be added to an existing worker's WORKER_QUEUES.

Supported tasks

The Queue name column is exactly the value you put in WORKER_QUEUES. The Label is the short form shown in TaskView. Scope is one of: project (user-triggered work tied to a project), system (global maintenance, no project), or backup_infrastructure (backup/restore, runs outside backup scope).

Synthetic data generation

Queue name Label Scope Description
standard GEN-SDN project Standard generation run (soft/hard timeout ~15 min).
infinite GEN-INF project Long-running generation with no timeout.
timed_5min GEN-5M project Generation capped at 5 minutes.
timed_30min GEN-30M project Generation capped at 30 minutes.
timed_1hour GEN-1H project Generation capped at 1 hour.
timed_4hour GEN-4H project Generation capped at 4 hours.
timed_8hour GEN-8H project Generation capped at 8 hours.
timed_24hour GEN-24H project Generation capped at 24 hours.
cron GEN-SCH project Scheduled (cron) generation run, no timeout.

Model and template generation from a source

Queue name Label Scope Description
database_generate_model DB-GEN-MOD project Generate a DATAMIMIC model from a scanned SQL database.
database_generate_weighting DB-GEN-WGT project Generate distribution weighting for a database model.
json_generate_model JSON-GEN-MOD project Generate a model from a JSON sample.
json_generate_template JSON-GEN-TMP project Generate a template from a JSON sample.
xml_generate_model XML-GEN-MOD project Generate a model from an XML sample.
xml_generate_template XML-GEN-TMP project Generate a template from an XML sample.
csv_generate_model CSV-GEN-MOD project Generate a model from a CSV sample.

Data Workbench scan family

Queue name Label Scope Description
db_metadata_scan DB-META-SCAN project Scan SQL database metadata (Workbench Database mode).
data_source_browse DS-BROWSE project Browse a data-source location (databases, collections, directories, files).
data_source_scan DS-SCAN project Scan a selected table, collection, or object into a snapshot.
scan_result_generate_model SCAN-GEN-MOD project Generate a model from a persisted scan snapshot.

Connections, migration, and Git

Queue name Label Scope Description
healthcheck CHK-ENV project Test an environment connection (SQL, MongoDB, Kafka, object storage).
gwa_migration GWA-MIG project Migrate a GWA project into DATAMIMIC.
ama_migration AMA-MIG project Migrate an AMA/EDI definition into DATAMIMIC.
project_git_push GIT-PUSH project Push project changes to the configured Git remote.
project_git_update GIT-UPDATE project Update the project from its Git remote.

System, maintenance, and backup

Queue name Label Scope Description
runtime_capabilities_fetch RUNTIME-CAPS system Fetch runtime capability metadata from the engine.
housekeeping HOUSEKEEPING system Periodic internal maintenance.
clean_up_project_storage CLEAN UP PROJECT STORAGE project Remove orphaned project storage.
system_backup SYSTEM_BACKUP backup_infrastructure Back up the DATAMIMIC data instance.
system_restore SYSTEM_RESTORE backup_infrastructure Restore the DATAMIMIC data instance from a backup.

A practical layout separates short interactive generation, long-running generation, and operational/source tasks across three worker groups. This mirrors the values shipped with the Helm chart:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
celeryworker:
  install: true

  # Each worker listens to a specific set of queues via WORKER_QUEUES.
  # The queue names are the TaskType values defined by the platform.
  workers:
    - name: operation-worker
      replicaCount: 1
      extraEnvs:
        WORKER_QUEUES: "healthcheck,housekeeping,clean_up_project_storage,runtime_capabilities_fetch,gwa_migration,ama_migration,project_git_push,project_git_update,db_metadata_scan,database_generate_model,database_generate_weighting,json_generate_model,json_generate_template,xml_generate_model,xml_generate_template,csv_generate_model,data_source_browse,data_source_scan,scan_result_generate_model"
      resources:
        limits:
          cpu: 300m
          memory: 500Mi
        requests:
          cpu: 100m
          memory: 256Mi
    - name: short-worker
      replicaCount: 1
      extraEnvs:
        WORKER_QUEUES: "standard,timed_5min,timed_30min,timed_1hour,cron"
      resources:
        limits:
          cpu: 1500m
          memory: 1.5Gi
        requests:
          cpu: 500m
          memory: 1Gi
    - name: long-worker
      replicaCount: 1
      extraEnvs:
        WORKER_QUEUES: "infinite,timed_4hour,timed_8hour,timed_24hour,system_backup,system_restore"
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 512Mi

Include database_generate_weighting

The database_generate_weighting (DB-GEN-WGT) queue must be consumed wherever database model generation is used. If you copied an older reference values.yaml, confirm this queue is present on a worker β€” otherwise weighting generation tasks are enqueued but never run.

Adding a queue to a worker

  1. Pick the worker group that should run the task (short interactive, long-running, or operational).
  2. Add the queue name to that worker's WORKER_QUEUES β€” comma-separated, no spaces.
  3. Roll out the worker so the new configuration takes effect.

Queue names always equal the task type value (the Queue name column above). Do not invent queue names; only the values defined by the platform are routed.

For non-Helm deployments (for example Docker Compose), set the same WORKER_QUEUES environment variable on the worker container or in its .env file.