Custom Worker Configuration¶
DATAMIMIC runs all background work as Celery tasks. Every task type has its own queue, and each worker process consumes a configurable set of queues. This page explains the task/queue model, lists every supported task, and shows how to assign queues to workers so that no task type is left unconsumed.
How task routing works¶
- Each task type maps to exactly one queue, named after the task type β for example, the
data_source_scantask is routed to thedata_source_scanqueue. - A worker only processes the queues listed in its
WORKER_QUEUESenvironment variable: a comma-separated list of queue names, no spaces. - A task whose queue is not consumed by any running worker stays enqueued and never runs. Splitting work across workers is therefore a deliberate routing decision β every queue you intend to use must appear in at least one running worker's
WORKER_QUEUES.
Custom topologies must cover every queue you use
The default deployment ships values that already cover all queues, so it needs no action. If you define your own worker layout, make sure every task type your users can trigger is covered β otherwise those tasks never run. Task types added in a release (for example the Workbench scan family data_source_browse, data_source_scan, and scan_result_generate_model in 3.5.0) must be added to an existing worker's WORKER_QUEUES.
Supported tasks¶
The Queue name column is exactly the value you put in WORKER_QUEUES. The Label is the short form shown in TaskView. Scope is one of: project (user-triggered work tied to a project), system (global maintenance, no project), or backup_infrastructure (backup/restore, runs outside backup scope).
Synthetic data generation¶
| Queue name | Label | Scope | Description |
|---|---|---|---|
standard |
GEN-SDN | project | Standard generation run (soft/hard timeout ~15 min). |
infinite |
GEN-INF | project | Long-running generation with no timeout. |
timed_5min |
GEN-5M | project | Generation capped at 5 minutes. |
timed_30min |
GEN-30M | project | Generation capped at 30 minutes. |
timed_1hour |
GEN-1H | project | Generation capped at 1 hour. |
timed_4hour |
GEN-4H | project | Generation capped at 4 hours. |
timed_8hour |
GEN-8H | project | Generation capped at 8 hours. |
timed_24hour |
GEN-24H | project | Generation capped at 24 hours. |
cron |
GEN-SCH | project | Scheduled (cron) generation run, no timeout. |
Model and template generation from a source¶
| Queue name | Label | Scope | Description |
|---|---|---|---|
database_generate_model |
DB-GEN-MOD | project | Generate a DATAMIMIC model from a scanned SQL database. |
database_generate_weighting |
DB-GEN-WGT | project | Generate distribution weighting for a database model. |
json_generate_model |
JSON-GEN-MOD | project | Generate a model from a JSON sample. |
json_generate_template |
JSON-GEN-TMP | project | Generate a template from a JSON sample. |
xml_generate_model |
XML-GEN-MOD | project | Generate a model from an XML sample. |
xml_generate_template |
XML-GEN-TMP | project | Generate a template from an XML sample. |
csv_generate_model |
CSV-GEN-MOD | project | Generate a model from a CSV sample. |
Data Workbench scan family¶
| Queue name | Label | Scope | Description |
|---|---|---|---|
db_metadata_scan |
DB-META-SCAN | project | Scan SQL database metadata (Workbench Database mode). |
data_source_browse |
DS-BROWSE | project | Browse a data-source location (databases, collections, directories, files). |
data_source_scan |
DS-SCAN | project | Scan a selected table, collection, or object into a snapshot. |
scan_result_generate_model |
SCAN-GEN-MOD | project | Generate a model from a persisted scan snapshot. |
Connections, migration, and Git¶
| Queue name | Label | Scope | Description |
|---|---|---|---|
healthcheck |
CHK-ENV | project | Test an environment connection (SQL, MongoDB, Kafka, object storage). |
gwa_migration |
GWA-MIG | project | Migrate a GWA project into DATAMIMIC. |
ama_migration |
AMA-MIG | project | Migrate an AMA/EDI definition into DATAMIMIC. |
project_git_push |
GIT-PUSH | project | Push project changes to the configured Git remote. |
project_git_update |
GIT-UPDATE | project | Update the project from its Git remote. |
System, maintenance, and backup¶
| Queue name | Label | Scope | Description |
|---|---|---|---|
runtime_capabilities_fetch |
RUNTIME-CAPS | system | Fetch runtime capability metadata from the engine. |
housekeeping |
HOUSEKEEPING | system | Periodic internal maintenance. |
clean_up_project_storage |
CLEAN UP PROJECT STORAGE | project | Remove orphaned project storage. |
system_backup |
SYSTEM_BACKUP | backup_infrastructure | Back up the DATAMIMIC data instance. |
system_restore |
SYSTEM_RESTORE | backup_infrastructure | Restore the DATAMIMIC data instance from a backup. |
Recommended worker layout¶
A practical layout separates short interactive generation, long-running generation, and operational/source tasks across three worker groups. This mirrors the values shipped with the Helm chart:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | |
Include database_generate_weighting
The database_generate_weighting (DB-GEN-WGT) queue must be consumed wherever database model generation is used. If you copied an older reference values.yaml, confirm this queue is present on a worker β otherwise weighting generation tasks are enqueued but never run.
Adding a queue to a worker¶
- Pick the worker group that should run the task (short interactive, long-running, or operational).
- Add the queue name to that worker's
WORKER_QUEUESβ comma-separated, no spaces. - Roll out the worker so the new configuration takes effect.
Queue names always equal the task type value (the Queue name column above). Do not invent queue names; only the values defined by the platform are routed.
For non-Helm deployments (for example Docker Compose), set the same WORKER_QUEUES environment variable on the worker container or in its .env file.