Skip to content

Data Obfuscation

Data obfuscation is a crucial process for protecting sensitive information in datasets. This guide will walk you through the steps of using DATAMIMIC to perform data obfuscation, showcasing two variants:

  • updating specific columns in a database and
  • obfuscating records from a CSV file.
Demo Project Anonymization
Demo Project Anonymization

Steps

  1. Clone the Demo Anonymization from the Demo Store

    • This project includes predefined models and configurations to help you get started quickly.
  2. Setup Environments

  3. In the DATAMIMIC base model, we have referenced and predefined two environments:

    • sourceDB: The database from which we will read the data.
    • targetDB: The database where the obfuscated records will be written.
      • For more details on setting up and managing environments, refer to the Environments documentation.
  4. Switch to the File 2_generate

  5. This file demonstrates two variants of data obfuscation:

Variant 1: Database Record Obfuscation

This variant retrieves records from the CUSTOMER table of the sourceDB and updates the name column. All other columns remain unchanged. The name column is obfuscated by appending the string _mask. The obfuscated data will be written into targetDB.

1
2
3
<generate name="CUSTOMER" source="sourceDB" type="CUSTOMER" target="targetDB">
    <key name="name" script="name+'_mask'" />
</generate>

Variant 2: Database Record Obfuscation with Converter

In Variant 2, built-in converters are used to anonymize existing values of the keys, i.e., cell values of the columns full_name, email, and tc_creation_src.

1
2
3
4
5
6
7
<generate source="sourceDB" name="USER" target="targetDB">
    <key name="full_name" script="full_name + '_mask'" />
    <!-- Use the default mask converter to mask into '*' (default char) -->
    <key name="email" script="email" converter="Mask" />
    <!-- Optionally pass a character you prefer to the Mask converter -->
    <key name="tc_creation_src" converter="Mask('!')" script="tc_creation_src" />
</generate>

Variant 3: CSV File Obfuscation with Multiple Approaches

This variant shows how to obfuscate person records from a CSV file. Various options are used to update or overwrite the original data from the file and write it as a new file with name ObfuscateCSV.csv.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
<generate name="ObfuscateCSV" type="ObfuscateCSV" source="data/persons.ent.csv" target="CSV">
    <variable name="addr" entity="Address" />
    <!-- Cut the length of string from the start -->
    <key name="familyName" script="familyName" converter="CutLength(3)" />
    <key name="givenName" script="givenName" converter="Mask" />
    <key name="alias" script="alias" converter="Append('_demo')" />
    <key name="street" script="addr.street" />
    <key name="city" script="addr.city" />
    <key name="country" constant="US" />
    <key name="accountNo" script="accountNo" converter="Hash('sha256', 'hex')" />
    <key name="ssn" script="ssn" converter="MiddleMask(2,3)" />
    <key name="creditCardNo" script="creditCardNo" converter="Hash('sha1', 'base64')" />
    <key name="secret1" script="secret1" converter="Hash('md5', 'hex')" />
    <key name="secret2" script="secret2" converter="Hash('md5', 'base64')" />
    <key name="secret3" script="secret3" converter="Hash('sha1','hex')" />
    <key name="secret4" script="secret4" converter="Hash('sha1', 'base64')" />
</generate>

Recap

  1. Review the demo Anonymization from the Demo Store.
  2. In the 2_generate file, review the data obfuscation variants to update specific database columns or to obfuscate records from a CSV file.
  3. Review the DATAMIMIC models 3-1-anon-person-constant and 3-2-anon-person-hash.xml for additional scenarios and obfuscation options.