Skip to main content

2 posts tagged with "subsetting-data"

View All Tags

· 4 min read
Juan Rodriguez


Data masking is a strategy used to protect sensitive data in a dataset by transforming them into different data that maintain the coherence and consistency of the original set. A good data masking not only has to maintain data consistency and relations between tables, but also needs to replicate the same statistical distribution as the original source.

Also known as data "anonymization", "obfuscation" or "tokenization", seeks to generate realistic and anonymized datasets based on real production data, which can be used for alternative purposes such as analytics, test generation or AI training, all without compromising the security of the real data. Therefore, in order to keep the real data secure, the data masking process is irreversible; the user will not be able to get the real data by using the masked version.

Share post:

· 3 min read
Juan Rodriguez


How many times have we found that the data we are working with, either in testing or development, belong to real customers or users? Although it may not seem like it is a common practice in many companies. Working with real data is one of the factors that increase the risk of suffering a security breach in a company.

According to IBM's 2021 Data Security Breach Cost Report, these data breaches have increased by 10% in just one year and 135% in the last 6 years. This data corroborates that there is a trend in the search for this type of vulnerabilities in companies.

Share post: