Guaranteeing software quality is a challenge that demands maximum security of sensitive information. When handling test data in databases like MySQL, this process falls under the discipline of Test Data Management, where it is essential that the data does not contain Personally Identifiable Information (PII), strictly complying with current data protection regulations.
This article is a technical and practical guide focused on using MySQL scripts to generate large volumes of realistic, synthetic test data. We'll show you the exact, step-by-step method to simulate testing scenarios in QA, maintaining the confidentiality and security of that sensitive information throughout the process.
Why is it important to generate realistic test data in MySQL?
QA and testing environments require databases that simulate production conditions, always protecting user privacy. Managing this risk is a fundamental pillar of Data Security. Real data, if not managed properly, jeopardizes system security. Therefore, regulations like the GDPR require data processing subject to strict risk management protocols.
Generating realistic test data in MySQL is the solution. This allows QA and DBA teams to execute tests, while facilitating regulatory compliance for CIOs and executives without compromising privacy or violating the law, ensuring that the results are as reliable as possible.
Generating realistic data with MySQL
Step 1: Creating the Database and Test Table in MySQL
First, we'll create the test_users table schema in MySQL. This structure defines the fields that contain PII attributes, and in this tutorial, synthetic data will be inserted for safe use in QA.
-- Defines the schema of the 'test_users' table which will contain PII attributes (like name and email).
CREATE TABLE test_users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(150),
phone VARCHAR(20),
address VARCHAR(255),
city VARCHAR(100),
birthdate DATE
);Step 2: Generate realistic test data
Next, we'll use a MySQL stored procedure to generate random data that simulates reality. This script generates synthetic data using built-in MySQL functions like RAND(), CONCAT(), and FLOOR().
DELIMITER $$
CREATE PROCEDURE generate_test_data(IN num_rows INT)
BEGIN
DECLARE i INT DEFAULT 0;
WHILE i < num_rows DO
INSERT INTO test_users (name, email, phone, address, city, birthdate)
VALUES (
CONCAT('User_', FLOOR(1 + (RAND() * 99999))), -- Random name
CONCAT('user', FLOOR(1 + (RAND() * 99999)), '@example.com'), -- Synthetic email
CONCAT('+1-', FLOOR(1000000000 + (RAND() * 8999999999))), -- Random phone number
CONCAT(FLOOR(1 + (RAND() * 999)), ' Test Street'), -- Synthetic address
CONCAT('City_', FLOOR(1 + (RAND() * 500))), -- Random city
DATE_SUB(CURDATE(), INTERVAL FLOOR(18 + (RAND() * 50)) YEAR) -- Random birthdate
);
SET i = i + 1;
END WHILE;
END$$
DELIMITER ;Step 3: Execute the procedure to generate data
Once the procedure is created, we can generate the amount of data needed. For example, to generate 1,000 rows of test data, simply execute the following command:
CALL generate_test_data(1000);This will insert 1,000 records of fake test data into the
test_userstable, allowing QA teams to conduct tests without compromising the security of real data.
Step 4: Verify the generated data
To verify that the data has been inserted correctly, we can perform a simple SQL query:
SELECT * FROM test_users LIMIT 10;This will display the first 10 records, which will have realistic names, emails, phone numbers, and other details but will not be sensitive.
Data Masking and Anonymization in MySQL
While generating synthetic test data is an effective strategy, many organizations also work with subsets of real production data for testing purposes. In these cases, applying data masking and anonymization techniques is essential to ensure privacy, reduce risk, and comply with regulations such as GDPR and NIS2.
Data masking refers to transforming sensitive data elements (such as names, emails, phone numbers) into fictional but realistic values. The goal is to preserve data structure and integrity while preventing identification of individuals.
Anonymization aims to irreversibly remove any linkage between the data and the individual, ensuring that the information cannot be traced back to a real person.
Here are some examples of simple techniques you can apply in MySQL:
Generalization
You can reduce the precision of data to lower its sensitivity. For example, instead of storing the full birthdate, you can store only the year of birth:
UPDATE test_users
SET birthdate = DATE_FORMAT(birthdate, '%Y-01-01');Suppression
Sometimes the best approach is to suppress certain fields entirely, especially if they are not required for the test case:
UPDATE test_users
SET address = NULL;Pseudonymization
You can replace identifiers (such as names) with random pseudonyms:
UPDATE test_users
SET name = CONCAT('User_', FLOOR(1 + (RAND() * 9999)));- Pseudonymization: Replacing direct identifiers with random pseudonyms. For instance, we can replaceUser_ID
with a random or pseudonymous value.
UPDATE test_users
SET name = CONCAT('User_', FLOOR(1 + (RAND() * 9999)));From Manual Script to Automation
Looking for real automation? Generating test data with native scripts works, but managing complex scenarios (such as maintaining referential integrity between tables or continuous automation) requires a solution designed for production environments.

