Skip to content

Domain Generators

DATAMIMIC domains are a vehicle for defining, bundling and reusing domain specific data generation, e.g. for personal data, addresses, internet, banking, telecom. They may be localized to specific languages and be grouped to hierarchical datasets, e.g. for continents, countries and regions.

DATAMIMIC includes several domains that have simple implementation of specific data generation. If you need further domains, we highly appreciate your feedback and contributions.

The following domains are included:

  • person: Data related to a person

  • address: Data related to contacting a person by post

  • organization: Organization data

  • finance: Finance data

  • net: Internet and network related data

  • product: Product-related data

  • br and us: Country specific data

Additionally, DATAMIMIC includes an easy way to utilize the FAKER library document below for additional datasets.

The person domain has three major components:

  • Person: Generates Person entities

  • AcademicTitleGenerator: Generates academic titles The generator can be config with academic_title_quota.

  • NobilityTitleGenerator: Generates nobility title The generator can be config with noble_quota.

  • GivenNameGenerator: Generates given names

  • FamilyNameGenerator: Generates family names

  • BirthDateGenerator: Generates birth dates

  • GenderGenerator: Generates Gender values. The generated gender can be one of the values MALE, FEMALE, OTHER. The generator is configured with the property female_quota, other_gender_quota. female_quota is highest priority, then other_gender_quota.

  • EmailAddressGenerator: Generates Email addresses

Person Entity

Creates Person entities to be used for prototype-based data generation. It can be configured with dataset and locale property. The generated Person Entity exhibits the properties salutation, title, given_name, family_name (four fields dataset-dependent), gender, birthdate, age, email. If the chosen dataset definition provides name weights, DATAMIMIC generates person names according to their statistical probability. Of course, gender, salutation and given_name are consistent.

You can use the Person entity like this:

1
2
3
4
5
<generate name="user" count="5" target="CSV">
  <variable name="person" entity="Person(min_age=20, max_age=45, female_quota=0.5)" dataset="FR"/>
  <key name="salutation" script="person.salutation"/>
  <key name="name" script="f'{person.given_name} {person.family_name}'"/>
</generate>

to get output similar to this:

1
2
3
4
5
6
salutation|name
Mme|Claude Bernard
Mme|Jeannine Lefebvre
M.|Robert Bernard
M.|Roger Morel
Mme|Dominique Dubois

The Person entity has the following data fields:

property name type property description
salutation String Salutation (e.g. Mr/Mrs)
academic_title String Academic title (e.g. Dr)
given_name String Given name ('first name' in western countries)
family_name String Family name ('surname' in western countries)
gender Gender Gender (male, female or other)
birthdate Date Birth date
age Integer actual age
email String email address
nobility_title String Noble title (e.g. Baron/Baroness)

Person Entity Properties

The Person Entity can be configured with several properties:

Property Description Default Value
dataset Either a region name or the two-letter-ISO-code of a country, e.g. US for the USA. The user's default country
min_age The minimum age of generated persons 15
max_age The maximum age of generated persons 105
female_quota The quota of generated women (1 → 100%) 0.49
other_gender_quota The quota of generated other gender (1 → 100%) 0.02
noble_quota The rate of generated noble title (1 → 100%) 0.001
academic_title_quota The rate of generated academic title (1 → 100%) 0.5

Supported countries

country code remarks
Austria AT most common 120 given names with absolute weight, most common 40 family names with absolute weight
Australia AU most common 40 given names (unweighted), most common 20 family names with absolute weight
Belgium BE most common 38 given names (unweighted), most common 15 family names with absolute weight
Brazil BR most common 100 given names (unweighted), most common 29 family names (unweighted)
Canada CA most common 80 given names (unweighted), most common 20 family names (unweighted). No coupling between given name locale and family name locale
Switzerland CH most common 30 given names with absolute weight, most common 20 family names with absolute weight
China CN Chinese letters. Most common 46 given names (unweighted), most common 106 family names with absolute weight
Czech Republic CZ most common 20 given names with absolute weight, most common 20 family names with absolute weight. Female surnames are supported.
Germany DE most common 1998 given names with absolute weight, most common 3421 family names with absolute weight
Spain ES most common 40 given names (unweighted), most common 40 family names with absolute weight
Finland FI most common 785 given names (unweighted), most common 448 family names (unweighted)
France FR most common 100 given names (unweighted), most common 30 family names with relative weight
Ireland IE most common 41 given names (unweighted), most common 26 family names (unweighted)
Israel IL 264 given names (unweighted), most common 30 family names with relative weight
India IN most common 155 given names (unweighted), most common 50 family names (unweighted)
Italy IT most common 60 given names (unweighted), most common 20 family names (unweighted)
Japan JP Kanji letters. Most common 109 given names (unweighted), most common 50 family names with absolute weight
Republic of Korea KR Hangul letters. Most common 91 given names (unweighted), most common 182 family names with absolute weigh
Netherlands NL 3228 given names (unweighted), most common 10 family names with absolute weight
Norway NO most common 300 given names (unweighted), most common 100 family names with absolute weight
New Zealand NZ most common 20 given names (unweighted), most common 8 family names (unweighted)
Poland PL most common 67 given names with absolute weight, most common 20,000 family names with absolute weight. Female surnames are supported.
Russia RU Cyrillic letters. Most common 33 given names with relative weight, most common 20 family names with relative weight. Female surnames are supported.
Sweden SE 779 given names (unweighted), most common 22 family names with relative weight
Slovenia SI most common 400 given names with relative weight, most common 200 family names with relative weight
Slovakia SK most common 20 given names with relative weight, most common 22 family names with relative weight
Turkey TR 1077 given names (unweighted), 37 family names (unweighted)
Ukraine UA most common 48 given (unweighted), most common 20 family names (unweighted)
United Kingdom GB most common 20 given (unweighted), most common 25 family names (unweighted)
USA US most common 600 given names and most common 1000 family names both with absolute weight

Address Generators

  • Address Entity: Generates addresses that match simple validity checks: The City exists, the ZIP code matches and the phone number area codes are right. The street names are random, so most addresses will not stand validation of real existence.

  • Country Entity: Generates countries

  • City Entity: Generates Cities for a given country

  • PhoneNumberGenerator: Generates landline telephone numbers for a country

  • StreetNameGenerator: Generates street names for a given country

Address Entity

You can use the Address entity like this:

1
2
3
4
5
<generate name="data" count="5" target="CSV">
  <variable name="address" entity="Address" dataset="FR"/>
  <key name="street" script="address.street"/>
  <key name="home" script="f'{address.house_number} {address.street}, {address.city}'"/>
</generate>

to get output similar to this:

1
2
3
4
5
6
street|home
place de l'Eglise|7463 place de l'Eglise, Le Mans
rue du moulin|2695 rue du moulin, Champdieu
rue des écoles|1524 rue des écoles, Nibelle
rue du stade|149 rue du stade, Ruan
rue de la gare|4704 rue de la gare, Châteauroux

The generated Address entities have the following data fields:

Property Name Type Property Description
street String The regular street address
house_number String The house number associated with the street address
postal_code(zip_code) String The postal or ZIP code
city String The name of the city
state String The state or region
country String The country
country_code String Two-letter country codes, equal to dataset
office_phone String Office phone number
private_phone String Private phone number
mobile_phone String Mobile phone number
fax String Fax number
organization String The associated organization

City Entity

You can use the City entity like this:

1
2
3
4
5
<generate name="data" count="5" target="CSV">
  <variable name="city" entity="City" dataset="FR"/>
  <key name="name" script="city.name"/>
  <key name="state" script="f'{city.state}, {city.country}'"/>
</generate>

to get output similar to this:

1
2
3
4
5
6
name|state
Sauvigny-le-Beuréal|Bourgogne-Franche-Comté, France
Pau|Nouvelle-Aquitaine, France
Beaumont-de-Lomagne|Occitanie, France
Marat|Auvergne-Rhône-Alpes, France
Laifour|Grand Est, France

The generated City entities have the following data fields:

Property Name Type Property Description
name String The name of the city
name_extension String Additional name or descriptor for the city
state String The state or region where the city is located
country String The country where the city is located
area_code String The telephone area code for the city
language String The primary language spoken in the city
population Integer The population count of the city

Country Entity

You can use the Country entity like this:

1
2
3
4
5
<generate name="data" count="5" target="CSV">
  <variable name="country" entity="Country"/>
  <key name="name" script="country.name"/>
  <key name="language" script="country.default_language_locale"/>
</generate>

to get output similar to this:

1
2
3
4
5
6
name|language
Ireland|en_IE
Russian Federation|ru_RU
Belgium|fr_BE
New Zealand|en_NZ
Venezuela|es_VE

The generated Country entities have the following data fields:

Property Name Type Property Description
iso_code String The ISO code representing the country
name String The official name of the country
default_language_locale String The default language locale used in the country
phone_code String The international phone code for the country
population Integer The population count of the country

Supported countries

The following countries are supported for this domain:

country code remarks
USA US Valid ZIP codes and area codes, no assurance that the street exists in this city.
United Kingdom GB Valid area codes, no postcodes, no assurance that the street exists in this city or the local phone number has the appropriate length. Contributions are welcome
Germany DE Valid ZIP codes and area codes, no assurance that the street exists in this city or the local phone number has the appropriate length
Switzerland CH Valid ZIP codes and area codes, no assurance that the street exists in this city or the local phone number has the appropriate length
Brazil BR Valid ZIP codes and area codes, no assurance that the street exists in this city or the local phone number has the appropriate length

Update:

We now support more country: AD, AL, AT, AU, BA, BE, BG, CA, CY, CA,
DK, EE, ES, FI, FR, GR, HR, HU, EI, IS, IT,
LI, LT, LU, LV, MC, NL, NO, NZ, PL, PT, RO,
RU, SE, SI, SK, SM, TH, TR, UA, VA, VE, VN

(Noted that some countries are missing postcodes, also no assurance that the street exists in this city or the local phone number has the appropriate length. Contributions are welcome.)

Net

The net domain provides the

  • DomainGenerator, which generates Internet domain names
1
<key name="domain" generator="DomainGenerator"/>

Organization

Provides the Company Entity along with the following generators:

  • CompanyNameGenerator, a generator for company names.

  • DepartmentNameGenerator, a generator for department names

If you use the CompanyNameGenerator like this:

1
2
3
4
<generate name="company" count="5" target="CSV">
    <key name="name" generator="CompanyNameGenerator" />
    <key name="department" generator="DepartmentNameGenerator" />
</generate>
you get output like this:

1
2
3
4
5
6
name|department
Hotology|Legal
ClickBot|Legal
WireForge|Logistics
TradeSoft|Logistics
GigaSpace|Sales

Company names can be generated for the following countries:

country code remarks
Germany DE none
USA US none

Company Entity

The generated Company entities have the following data fields:

Property Name Type Property Description
city String The city where the company is located
country String The country where the company is located
country_code String The ISO country code
email String The company's email address
fax String The company's fax number
full_name String The full legal name of the company
house_number String The house number associated with the company's address
id String A unique identifier for the company
sector String The sector or industry in which the company operates
short_name String The short or common name of the company
office_phone String The company's office phone number
zip_code String The postal or ZIP code
state String The state or region where the company is located
street String The street address of the company
url String The company's website URL
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
<generate name="company_list" count="10" target="CSV">
  <variable name="company" entity="Company" />
  <key name="city" script="company.city" />
  <key name="country" script="company.country" />
  <key name="country_code" script="company.country_code" />
  <key name="email" script="company.email" />
  <key name="fax" script="company.fax" />
  <key name="full_name" script="company.full_name" />
  <key name="house_number" script="company.house_number" />
  <key name="id" script="company.id" />
  <key name="sector" script="company.sector" />
  <key name="short_name" script="company.short_name" />
  <key name="office_phone" script="company.office_phone" />
  <key name="zip_code" script="company.zip_code" />
  <key name="state" script="company.state" />
  <key name="street" script="company.street" />
  <key name="url" script="company.url" />
</generate>

Finance

Generates and validates finance related data:

The following generators are provided:

  • BankAccount Entity: Entity for data around a bank account.

  • CreditCard Entity: Entity for data around a credit card.

BankAccount Entity

The generated BankAccount entities have the following data fields:

Property Name Type Property Description
account_number String The bank account number
bank_code String The code identifying the bank
bank_name String The name of the bank
bic String The Bank Identifier Code (BIC)
iban String The International Bank Account Number (IBAN)

CreditCard Entity

The generated CreditCard entities have the following data fields:

Property Name Type Property Description
credit_card_number String The credit card number
card_holder String The name of the cardholder
cvc_number String The card verification code (CVC) number
expiration_date String The expiration date of the credit card
credit_card_provider String The provider of the credit card (e.g., Visa, MasterCard)

Product

The product package provides you with Generator classes for EAN codes:

  • EANGenerator: Generates both 8-digit and 13-digit EAN codes

BR

Provides objects specific to Brazil:

  • CNPJGenerator: Generates CNPJs (Cadastro Nacional da Pessoa Jurídica)

  • CPFGenerator: Generates CPFs (Cadastro de Pessoa Fisica)

US

Provides objects specific for the United States of America:

  • SSNGenerator: Generates Social Security Numbers

Faker

The faker package provides the Generator object with the Faker Library of Python.

  • DataFakerGenerator : Generates data for many topics such as bank, color, currency, file, geo...

Because this Generator has many topics, each topic has many properties, you have to choose the Provider Name and put it into the 'generator' as parameters (like this generator="DataFakerGenerator('faker_provider_name')").

Optionally you may want to define the Locale like this generator="DataFakerGenerator('faker_provider_name', locale='de_AT')"

You can use the DataFakerGenerator like this:

1
2
3
4
5
<generate name="faker_sample" count="5" target="CSV">
  <key name="job" generator="DataFakerGenerator('job', locale='en_US')" />
  <key name="name" generator="DataFakerGenerator('user_agent', locale='en_US')" />
  <key name="address" generator="DataFakerGenerator('hostname', locale='en_US')" />
</generate>

to get output similar to this:

1
2
3
4
5
6
job|name|address
Legal secretary|Mozilla/5.0 (compatible; MSIE 5.0; Windows NT 6.1; Trident/3.1)|lt-82.wilson-robbins.biz
Engineer, production|Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_6 rv:3.0; ka-GE) AppleWebKit/532.36.2 (KHTML, like Gecko) Version/4.0 Safari/532.36.2|db-21.gonzalez.com
Art therapist|Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.0 (KHTML, like Gecko) Chrome/50.0.825.0 Safari/534.0|db-99.weeks-diaz.net
Fast food restaurant manager|Mozilla/5.0 (Macintosh; PPC Mac OS X 10_12_8 rv:5.0; raj-IN) AppleWebKit/532.38.5 (KHTML, like Gecko) Version/4.1 Safari/532.38.5|db-12.martin.com
Publishing rights manager|Mozilla/5.0 (Linux; Android 4.3.1) AppleWebKit/531.0 (KHTML, like Gecko) Chrome/58.0.897.0 Safari/531.0|email-63.price.com

Supported topics:

FakerGenerator can generate data for multiple topics. Learn more about available providers in the Faker Docs.

Supported Locales

Locales may vary depending on the data you create and not available for all datasets.

Language code
Bulgarian bg
Catalan ca, ca_CAT, da_DK
German de, de_AT, de_CH
English en, en_AU, en_au_ocker, en_BORK, en_CA, en_GB, en_IND, en_MS, en_NEP, en_NG, en_NZ, en_PAK, en_SG, en_UG, en_US, en_ZA
Spanish es, es_MX
Finnish fi_FI
French fr
Hungarian hu
Indonesian in_ID
Italian it
Japanese ja
Korean ko
Norwegian Bokmål nb_NO
Dutch nl
Polish pl
Portuguese pt, pt_BR
Russian ru
Slovak sk
Swedish sv, sv_SE
Turkish tr
Ukrainian uk
Vietnamese vi
Chinese zh_CN, zh_TW