Domain Generators¶
DATAMIMIC domains are a vehicle for defining, bundling and reusing domain specific data generation, e.g. for personal data, addresses, internet, banking, telecom. They may be localized to specific languages and be grouped to hierarchical datasets, e.g. for continents, countries and regions.
DATAMIMIC includes several domains that have simple implementation of specific data generation. If you need further domains, we highly appreciate your feedback and contributions.
The following domains are included:
-
person: Data related to a person
-
address: Data related to contacting a person by post
-
organization: Organization data
-
finance: Finance data
-
net: Internet and network related data
-
product: Product-related data
-
br and us: Country specific data
Additionally, DATAMIMIC includes an easy way to utilize the FAKER library document below for additional datasets.
Person Related generators¶
The person domain has three major components:
-
Person: Generates Person entities
-
AcademicTitleGenerator: Generates academic titles The generator can be config with
academic_title_quota
. -
NobilityTitleGenerator: Generates nobility title The generator can be config with
noble_quota
. -
GivenNameGenerator: Generates given names
-
FamilyNameGenerator: Generates family names
-
BirthDateGenerator: Generates birth dates
-
GenderGenerator: Generates Gender values. The generated gender can be one of the values
MALE
,FEMALE
,OTHER
. The generator is configured with the propertyfemale_quota
,other_gender_quota
.female_quota
is highest priority, thenother_gender_quota
. -
EmailAddressGenerator: Generates Email addresses
Person Entity¶
Creates Person entities to be used for prototype-based data generation. It can be configured with dataset and locale property. The generated Person Entity exhibits the properties salutation, title, given_name, family_name (four fields dataset-dependent), gender, birthdate, age, email. If the chosen dataset definition provides name weights, DATAMIMIC generates person names according to their statistical probability. Of course, gender, salutation and given_name are consistent.
You can use the Person entity like this:
1 2 3 4 5 |
|
to get output similar to this:
1 2 3 4 5 6 |
|
The Person entity has the following data fields:
property name | type | property description |
---|---|---|
salutation | String | Salutation (e.g. Mr/Mrs) |
academic_title | String | Academic title (e.g. Dr) |
given_name | String | Given name ('first name' in western countries) |
family_name | String | Family name ('surname' in western countries) |
gender | Gender | Gender (male, female or other) |
birthdate | Date | Birth date |
age | Integer | actual age |
String | email address | |
nobility_title | String | Noble title (e.g. Baron/Baroness) |
Person Entity Properties¶
The Person Entity can be configured with several properties:
Property | Description | Default Value |
---|---|---|
dataset | Either a region name or the two-letter-ISO-code of a country, e.g. US for the USA. | The user's default country |
min_age | The minimum age of generated persons | 15 |
max_age | The maximum age of generated persons | 105 |
female_quota | The quota of generated women (1 → 100%) | 0.49 |
other_gender_quota | The quota of generated other gender (1 → 100%) | 0.02 |
noble_quota | The rate of generated noble title (1 → 100%) | 0.001 |
academic_title_quota | The rate of generated academic title (1 → 100%) | 0.5 |
Supported countries¶
country | code | remarks |
---|---|---|
Austria | AT | most common 120 given names with absolute weight, most common 40 family names with absolute weight |
Australia | AU | most common 40 given names (unweighted), most common 20 family names with absolute weight |
Belgium | BE | most common 38 given names (unweighted), most common 15 family names with absolute weight |
Brazil | BR | most common 100 given names (unweighted), most common 29 family names (unweighted) |
Canada | CA | most common 80 given names (unweighted), most common 20 family names (unweighted). No coupling between given name locale and family name locale |
Switzerland | CH | most common 30 given names with absolute weight, most common 20 family names with absolute weight |
China | CN | Chinese letters. Most common 46 given names (unweighted), most common 106 family names with absolute weight |
Czech Republic | CZ | most common 20 given names with absolute weight, most common 20 family names with absolute weight. Female surnames are supported. |
Germany | DE | most common 1998 given names with absolute weight, most common 3421 family names with absolute weight |
Spain | ES | most common 40 given names (unweighted), most common 40 family names with absolute weight |
Finland | FI | most common 785 given names (unweighted), most common 448 family names (unweighted) |
France | FR | most common 100 given names (unweighted), most common 30 family names with relative weight |
Ireland | IE | most common 41 given names (unweighted), most common 26 family names (unweighted) |
Israel | IL | 264 given names (unweighted), most common 30 family names with relative weight |
India | IN | most common 155 given names (unweighted), most common 50 family names (unweighted) |
Italy | IT | most common 60 given names (unweighted), most common 20 family names (unweighted) |
Japan | JP | Kanji letters. Most common 109 given names (unweighted), most common 50 family names with absolute weight |
Republic of Korea | KR | Hangul letters. Most common 91 given names (unweighted), most common 182 family names with absolute weigh |
Netherlands | NL | 3228 given names (unweighted), most common 10 family names with absolute weight |
Norway | NO | most common 300 given names (unweighted), most common 100 family names with absolute weight |
New Zealand | NZ | most common 20 given names (unweighted), most common 8 family names (unweighted) |
Poland | PL | most common 67 given names with absolute weight, most common 20,000 family names with absolute weight. Female surnames are supported. |
Russia | RU | Cyrillic letters. Most common 33 given names with relative weight, most common 20 family names with relative weight. Female surnames are supported. |
Sweden | SE | 779 given names (unweighted), most common 22 family names with relative weight |
Slovenia | SI | most common 400 given names with relative weight, most common 200 family names with relative weight |
Slovakia | SK | most common 20 given names with relative weight, most common 22 family names with relative weight |
Turkey | TR | 1077 given names (unweighted), 37 family names (unweighted) |
Ukraine | UA | most common 48 given (unweighted), most common 20 family names (unweighted) |
United Kingdom | GB | most common 20 given (unweighted), most common 25 family names (unweighted) |
USA | US | most common 600 given names and most common 1000 family names both with absolute weight |
Address Generators¶
-
Address Entity: Generates addresses that match simple validity checks: The City exists, the ZIP code matches and the phone number area codes are right. The street names are random, so most addresses will not stand validation of real existence.
-
Country Entity: Generates countries
-
City Entity: Generates Cities for a given country
-
PhoneNumberGenerator: Generates landline telephone numbers for a country
-
StreetNameGenerator: Generates street names for a given country
Address Entity¶
You can use the Address entity like this:
1 2 3 4 5 |
|
to get output similar to this:
1 2 3 4 5 6 |
|
The generated Address entities have the following data fields:
Property Name | Type | Property Description |
---|---|---|
street | String | The regular street address |
house_number | String | The house number associated with the street address |
postal_code(zip_code) | String | The postal or ZIP code |
city | String | The name of the city |
state | String | The state or region |
country | String | The country |
country_code | String | Two-letter country codes, equal to dataset |
office_phone | String | Office phone number |
private_phone | String | Private phone number |
mobile_phone | String | Mobile phone number |
fax | String | Fax number |
organization | String | The associated organization |
City Entity¶
You can use the City entity like this:
1 2 3 4 5 |
|
to get output similar to this:
1 2 3 4 5 6 |
|
The generated City entities have the following data fields:
Property Name | Type | Property Description |
---|---|---|
name | String | The name of the city |
name_extension | String | Additional name or descriptor for the city |
state | String | The state or region where the city is located |
country | String | The country where the city is located |
area_code | String | The telephone area code for the city |
language | String | The primary language spoken in the city |
population | Integer | The population count of the city |
Country Entity¶
You can use the Country entity like this:
1 2 3 4 5 |
|
to get output similar to this:
1 2 3 4 5 6 |
|
The generated Country entities have the following data fields:
Property Name | Type | Property Description |
---|---|---|
iso_code | String | The ISO code representing the country |
name | String | The official name of the country |
default_language_locale | String | The default language locale used in the country |
phone_code | String | The international phone code for the country |
population | Integer | The population count of the country |
Supported countries¶
The following countries are supported for this domain:
country | code | remarks |
---|---|---|
USA | US | Valid ZIP codes and area codes, no assurance that the street exists in this city. |
United Kingdom | GB | Valid area codes, no postcodes, no assurance that the street exists in this city or the local phone number has the appropriate length. Contributions are welcome |
Germany | DE | Valid ZIP codes and area codes, no assurance that the street exists in this city or the local phone number has the appropriate length |
Switzerland | CH | Valid ZIP codes and area codes, no assurance that the street exists in this city or the local phone number has the appropriate length |
Brazil | BR | Valid ZIP codes and area codes, no assurance that the street exists in this city or the local phone number has the appropriate length |
Update:
We now support more country: | AD, AL, AT, AU, BA, BE, BG, CA, CY, CA, DK, EE, ES, FI, FR, GR, HR, HU, EI, IS, IT, LI, LT, LU, LV, MC, NL, NO, NZ, PL, PT, RO, RU, SE, SI, SK, SM, TH, TR, UA, VA, VE, VN |
---|---|
(Noted that some countries are missing postcodes, also no assurance that the street exists in this city or the local phone number has the appropriate length. Contributions are welcome.)
Net¶
The net domain provides the
- DomainGenerator, which generates Internet domain names
1 |
|
Organization¶
Provides the Company Entity along with the following generators:
-
CompanyNameGenerator, a generator for company names.
-
DepartmentNameGenerator, a generator for department names
If you use the CompanyNameGenerator like this:
1 2 3 4 |
|
1 2 3 4 5 6 |
|
Company names can be generated for the following countries:
country | code | remarks |
---|---|---|
Germany | DE | none |
USA | US | none |
Company Entity¶
The generated Company entities have the following data fields:
Property Name | Type | Property Description |
---|---|---|
city | String | The city where the company is located |
country | String | The country where the company is located |
country_code | String | The ISO country code |
String | The company's email address | |
fax | String | The company's fax number |
full_name | String | The full legal name of the company |
house_number | String | The house number associated with the company's address |
id | String | A unique identifier for the company |
sector | String | The sector or industry in which the company operates |
short_name | String | The short or common name of the company |
office_phone | String | The company's office phone number |
zip_code | String | The postal or ZIP code |
state | String | The state or region where the company is located |
street | String | The street address of the company |
url | String | The company's website URL |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Finance¶
Generates and validates finance related data:
The following generators are provided:
-
BankAccount Entity: Entity for data around a bank account.
-
CreditCard Entity: Entity for data around a credit card.
BankAccount Entity¶
The generated BankAccount entities have the following data fields:
Property Name | Type | Property Description |
---|---|---|
account_number | String | The bank account number |
bank_code | String | The code identifying the bank |
bank_name | String | The name of the bank |
bic | String | The Bank Identifier Code (BIC) |
iban | String | The International Bank Account Number (IBAN) |
CreditCard Entity¶
The generated CreditCard entities have the following data fields:
Property Name | Type | Property Description |
---|---|---|
credit_card_number | String | The credit card number |
card_holder | String | The name of the cardholder |
cvc_number | String | The card verification code (CVC) number |
expiration_date | String | The expiration date of the credit card |
credit_card_provider | String | The provider of the credit card (e.g., Visa, MasterCard) |
Product¶
The product package provides you with Generator classes for EAN codes:
- EANGenerator: Generates both 8-digit and 13-digit EAN codes
BR¶
Provides objects specific to Brazil:
-
CNPJGenerator: Generates CNPJs (Cadastro Nacional da Pessoa Jurídica)
-
CPFGenerator: Generates CPFs (Cadastro de Pessoa Fisica)
US¶
Provides objects specific for the United States of America:
- SSNGenerator: Generates Social Security Numbers
Faker¶
The faker package provides the Generator object with the Faker Library of Python.
- DataFakerGenerator : Generates data for many topics such as bank, color, currency, file, geo...
Because this Generator has many topics, each topic has many properties, you have to choose the Provider Name and put it
into the 'generator' as parameters (like this generator="DataFakerGenerator('faker_provider_name')"
).
Optionally you may want to define the Locale like this
generator="DataFakerGenerator('faker_provider_name', locale='de_AT')"
You can use the DataFakerGenerator like this:
1 2 3 4 5 |
|
to get output similar to this:
1 2 3 4 5 6 |
|
Supported topics:¶
FakerGenerator can generate data for multiple topics. Learn more about available providers in the Faker Docs.
Supported Locales¶
Locales may vary depending on the data you create and not available for all datasets.
Language | code |
---|---|
Bulgarian | bg |
Catalan | ca, ca_CAT, da_DK |
German | de, de_AT, de_CH |
English | en, en_AU, en_au_ocker, en_BORK, en_CA, en_GB, en_IND, en_MS, en_NEP, en_NG, en_NZ, en_PAK, en_SG, en_UG, en_US, en_ZA |
Spanish | es, es_MX |
Finnish | fi_FI |
French | fr |
Hungarian | hu |
Indonesian | in_ID |
Italian | it |
Japanese | ja |
Korean | ko |
Norwegian Bokmål | nb_NO |
Dutch | nl |
Polish | pl |
Portuguese | pt, pt_BR |
Russian | ru |
Slovak | sk |
Swedish | sv, sv_SE |
Turkish | tr |
Ukrainian | uk |
Vietnamese | vi |
Chinese | zh_CN, zh_TW |