Skip to main content

Dictionaries

Dictionaries allow us to reuse values from previous transformations, not only between different jobs, but also in the same execution. This allows us to keep consistency in the resulting dataset, a given input will always produce the same output value.

info

Dictionaries are common to all taps within the same project. Therefore, we will be able to transform data consistently across multiple databases.

When running or editing a rule, the dictionary usage options will show up in the configuration right panel.

How to use

Depending on the criteria we choose, we will obtain different results depending on the dictionary configuration we set. These are the available ones:

Do not use dictionary

Ignore value matches in the transformation job. Completely different values will result even if the input value is identical.

Real data -->Anonymized data
idNameLast name
1EdithUpton
2KeithSmith
3EdithSmith
-->
idNameLast name
1GlendaLeannon
2FelixReynolds
3ValerieBlock

As we can see, although Edith was previously anonymized with Glenda, the second time it appears it becomes another value (Valerie). The same happens with the last name Smith.

Reuse in the same entity and field

Matches in the same entity and field will be transformed in the same way. Even if there are matches in other entities or fields, they will be ignored.

Real data -->Anonymized data

Entity: Customers

idNameLast name
1DanielleUpton
2JaySmith
3DanielleHerman
4DwayneSmith
-->

Entity: Customers

idNameLast name
1MelanieSpencer
2TedHuxley
3MelanieArmstrong
4LeonardHuxley

In this case, the name Danielle becomes Melanie, both the first and the second time it appears.

This happens because after the first clash, the value is stored in the dictionary, so when the entity and the field match again, the value is transformed in the same way. The same happens with the surname Smith, which in both occurrences is found in the entity Customers and in the Last name field.

Reuse by label or in the same entity and fields

Matches on fields labelled with the same label will be transformed in the same way. If they do not have the same label they may still match on the combination of entity and field (as in the previous case) and in which case the result would be identical.

Even if there are matches in other entities or columns they will be ignored if they do not have the same label.

Real data --> Anonymized data

Entity: Customers

idName (person/name)
1Randal
2Alma

Entity: Employees

idEmployee (person/name)
1Randal
2Ronnie
-->

Entity: Customers

idName (person/name)
1Mark
2Katherine

Entity: Employees

idEmployee (person/name)
1Mark
2Jeremy

In this case, Randal is transformed into Mark in both tables because even though occurrences happen in different entity and field, they both share the same label "person/name".

Reuse in all fields

Values that have already been stored in the dictionary will be used regardless of the entity and the field in which they have been found or the label assigned to it.

Real data-->Anonymized data

Entity: Customers

idNameLast name
1SusanHeaney
2BerthaSusan
3SusanKeeling

Entity: Employees

idEmployer_nameEmployer_last_name
1JanetRogahn
2MarianneMcGlynn
3SusanBauch
-->

Entity: Customers

idNameLast name
1PercyRodriguez
2JessePercy
3PercyLeffler

Entity: Employees

idEmployer_nameEmployer_last_name
1WhitneyKautzer
2GarryDare
3PercyMills

Although they do not share the same entity, field nor label, all occurrences of Susan will always become Percy, no matter where they are found along the datasource.

Save new transformations in the dictionary

If this option is enabled, transformations are stored and can be used in the next jobs.

If this option is not active, transformations carried out during the job will be deleted, so that they will only take effect during the job itself.

Overwrite current dictionary

If this option is active, the current project dictionary will be cleared before the rule is run, so no previously stored values will be reused.

info

Gigantic does not store any source data in its database. We use a cryptographic function to hash the entries. Therefore, it is impossible to revert the process to get the original data back.