Custom Worker Configuration¶

DATAMIMIC runs all background work as Celery tasks. Every task type has its own queue, and each worker process consumes a configurable set of queues. This page explains the task/queue model, lists every supported task, and shows how to assign queues to workers so that no task type is left unconsumed.

How task routing works¶

Each task type maps to exactly one queue, named after the task type — for example, the data_source_scan task is routed to the data_source_scan queue.
A worker only processes the queues listed in its WORKER_QUEUES environment variable: a comma-separated list of queue names, no spaces.
A task whose queue is not consumed by any running worker stays enqueued and never runs. Splitting work across workers is therefore a deliberate routing decision — every queue you intend to use must appear in at least one running worker's WORKER_QUEUES.

Custom topologies must cover every queue you use

The default deployment ships values that already cover all queues, so it needs no action. If you define your own worker layout, make sure every task type your users can trigger is covered — otherwise those tasks never run. Task types added in a release (for example the Workbench scan family data_source_browse, data_source_scan, and scan_result_generate_model in 3.5.0) must be added to an existing worker's WORKER_QUEUES.

Supported tasks¶

The Queue name column is exactly the value you put in WORKER_QUEUES. The Label is the short form shown in TaskView. Scope is one of: project (user-triggered work tied to a project), system (global maintenance, no project), or backup_infrastructure (backup/restore, runs outside backup scope).

Synthetic data generation¶

Queue name	Label	Scope	Description
`standard`	GEN-SDN	project	Standard generation run (soft/hard timeout ~15 min).
`infinite`	GEN-INF	project	Long-running generation with no timeout.
`timed_5min`	GEN-5M	project	Generation capped at 5 minutes.
`timed_30min`	GEN-30M	project	Generation capped at 30 minutes.
`timed_1hour`	GEN-1H	project	Generation capped at 1 hour.
`timed_4hour`	GEN-4H	project	Generation capped at 4 hours.
`timed_8hour`	GEN-8H	project	Generation capped at 8 hours.
`timed_24hour`	GEN-24H	project	Generation capped at 24 hours.
`cron`	GEN-SCH	project	Scheduled (cron) generation run, no timeout.

Model and template generation from a source¶

Queue name	Label	Scope	Description
`database_generate_model`	DB-GEN-MOD	project	Generate a DATAMIMIC model from a scanned SQL database.
`database_generate_weighting`	DB-GEN-WGT	project	Generate distribution weighting for a database model.
`json_generate_model`	JSON-GEN-MOD	project	Generate a model from a JSON sample.
`json_generate_template`	JSON-GEN-TMP	project	Generate a template from a JSON sample.
`xml_generate_model`	XML-GEN-MOD	project	Generate a model from an XML sample.
`xml_generate_template`	XML-GEN-TMP	project	Generate a template from an XML sample.
`csv_generate_model`	CSV-GEN-MOD	project	Generate a model from a CSV sample.

Data Workbench scan family¶

Queue name	Label	Scope	Description
`db_metadata_scan`	DB-META-SCAN	project	Scan SQL database metadata (Workbench Database mode).
`data_source_browse`	DS-BROWSE	project	Browse a data-source location (databases, collections, directories, files).
`data_source_scan`	DS-SCAN	project	Scan a selected table, collection, or object into a snapshot.
`scan_result_generate_model`	SCAN-GEN-MOD	project	Generate a model from a persisted scan snapshot.

Connections, migration, and Git¶

Queue name	Label	Scope	Description
`healthcheck`	CHK-ENV	project	Test an environment connection (SQL, MongoDB, Kafka, object storage).
`gwa_migration`	GWA-MIG	project	Migrate a GWA project into DATAMIMIC.
`ama_migration`	AMA-MIG	project	Migrate an AMA/EDI definition into DATAMIMIC.
`project_git_push`	GIT-PUSH	project	Push project changes to the configured Git remote.
`project_git_update`	GIT-UPDATE	project	Update the project from its Git remote.

System, maintenance, and backup¶

Queue name	Label	Scope	Description
`runtime_capabilities_fetch`	RUNTIME-CAPS	system	Fetch runtime capability metadata from the engine.
`housekeeping`	HOUSEKEEPING	system	Periodic internal maintenance.
`clean_up_project_storage`	CLEAN UP PROJECT STORAGE	project	Remove orphaned project storage.
`system_backup`	SYSTEM_BACKUP	backup_infrastructure	Back up the DATAMIMIC data instance.
`system_restore`	SYSTEM_RESTORE	backup_infrastructure	Restore the DATAMIMIC data instance from a backup.

Recommended worker layout¶

A practical layout separates short interactive generation, long-running generation, and operational/source tasks across three worker groups. This mirrors the values shipped with the Helm chart:

celeryworker:
  install: true

  # Each worker listens to a specific set of queues via WORKER_QUEUES.
  # The queue names are the TaskType values defined by the platform.
  workers:
    - name: operation-worker
      replicaCount: 1
      extraEnvs:
        WORKER_QUEUES: "healthcheck,housekeeping,clean_up_project_storage,runtime_capabilities_fetch,gwa_migration,ama_migration,project_git_push,project_git_update,db_metadata_scan,database_generate_model,database_generate_weighting,json_generate_model,json_generate_template,xml_generate_model,xml_generate_template,csv_generate_model,data_source_browse,data_source_scan,scan_result_generate_model"
      resources:
        limits:
          cpu: 300m
          memory: 500Mi
        requests:
          cpu: 100m
          memory: 256Mi
    - name: short-worker
      replicaCount: 1
      extraEnvs:
        WORKER_QUEUES: "standard,timed_5min,timed_30min,timed_1hour,cron"
      resources:
        limits:
          cpu: 1500m
          memory: 1.5Gi
        requests:
          cpu: 500m
          memory: 1Gi
    - name: long-worker
      replicaCount: 1
      extraEnvs:
        WORKER_QUEUES: "infinite,timed_4hour,timed_8hour,timed_24hour,system_backup,system_restore"
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 512Mi

Include database_generate_weighting

The database_generate_weighting (DB-GEN-WGT) queue must be consumed wherever database model generation is used. If you copied an older reference values.yaml, confirm this queue is present on a worker — otherwise weighting generation tasks are enqueued but never run.

Adding a queue to a worker¶

Pick the worker group that should run the task (short interactive, long-running, or operational).
Add the queue name to that worker's WORKER_QUEUES — comma-separated, no spaces.
Roll out the worker so the new configuration takes effect.

Queue names always equal the task type value (the Queue name column above). Do not invent queue names; only the values defined by the platform are routed.

For non-Helm deployments (for example Docker Compose), set the same WORKER_QUEUES environment variable on the worker container or in its .env file.