In software development, test data is crucial for performing quality checks in controlled environments without risking user privacy or compromising data security. Especially when handling sensitive data in database testing environments, it is essential to ensure that the data used does not reveal personally identifiable information (PII) or confidential details.



This article provides a technical approach to generating realistic and secure test data in MySQL, without the need for external tools. Through SQL and stored procedures, we can generate large volumes of realistic data to simulate test scenarios in QA databases. We also explain how to integrate anonymization techniques and ensure data confidentiality throughout the process.




Why is it important to generate realistic test data in MySQL?



QA and testing environments require databases that simulate production conditions, but without compromising user privacy. Real data, if not managed properly, can pose a risk to the security of the systems and individuals involved. Moreover, regulations like the GDPR (General Data Protection Regulation) mandate that personal data be handled with utmost care, especially in testing processes.



Generating realistic test data in MySQL allows QA teams, DBAs, and CIOs to conduct tests without compromising sensitive information privacy, ensuring that tests are as close to real-world conditions as possible while remaining compliant with legal standards.




Generating realistic data with MySQL



The following is a technical approach with MySQL scripts to create realistic test data for a QA testing environment. We will use MySQL’s native functions and anonymization techniques to ensure that the data is useful for testing but does not contain sensitive information.




Step 1: Create the database and table in MySQL



First, we need to create a test database in MySQL and the table that will contain the test data. In this case, we will simulate a user database for a registration system.



-- Create test_users table for generating anonymized test data
CREATE TABLE test_users (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(150),
    phone VARCHAR(20),
    address VARCHAR(255),
    city VARCHAR(100),
    birthdate DATE
);

This table structure includes typical personal data fields such as name, email, phone, address, city, and birthdate. These fields provide a realistic test model for evaluating anonymization, masking, and compliance with data protection requirements.




Step 2: Generate realistic test data



Next, we can use a stored procedure to automate the generation of random but realistic-looking test data. This example uses built-in MySQL functions such as RAND(), CONCAT(), and DATE_SUB() to populate the table with non-sensitive data.



DELIMITER $$

CREATE PROCEDURE generate_test_data(IN num_rows INT)
BEGIN
  DECLARE i INT DEFAULT 0;

  WHILE i < num_rows DO
      
      INSERT INTO test_users (name, email, phone, address, city, birthdate)
      VALUES (
          CONCAT('User_', FLOOR(1 + (RAND() * 99999))), -- Random name
          CONCAT('user', FLOOR(1 + (RAND() * 99999)), '@example.com'), -- Synthetic email
          CONCAT('+1-', FLOOR(1000000000 + (RAND() * 8999999999))), -- Random phone number
          CONCAT(FLOOR(1 + (RAND() * 999)), ' Test Street'), -- Synthetic address
          CONCAT('City_', FLOOR(1 + (RAND() * 500))), -- Random city
          DATE_SUB(CURDATE(), INTERVAL FLOOR(18 + (RAND() * 50)) YEAR) -- Random birthdate
      );
      
      SET i = i + 1;
  END WHILE;

END$$

DELIMITER ;


Step 3: Execute the procedure to generate data



Once the procedure is created, we can generate the amount of data needed. For example, to generate 1,000 rows of test data, simply execute the following command:



CALL generate_test_data(1000);


This will insert 1,000 records of fake test data into the


test_users

table, allowing QA teams to conduct tests without compromising the security of real data.



Step 4: Verify the generated data



To verify that the data has been inserted correctly, we can perform a simple SQL query:


SELECT * FROM test_users LIMIT 10;

This will display the first 10 records, which will have realistic names, emails, phone numbers, and other details but will not be sensitive.




Data Masking and Anonymization in MySQL



While generating synthetic test data is an effective strategy, many organizations also work with subsets of real production data for testing purposes. In these cases, applying data masking and anonymization techniques is essential to ensure privacy, reduce risk, and comply with regulations such as GDPR and NIS2.



Data masking refers to transforming sensitive data elements (such as names, emails, phone numbers) into fictional but realistic values. The goal is to preserve data structure and integrity while preventing identification of individuals.



Anonymization aims to irreversibly remove any linkage between the data and the individual, ensuring that the information cannot be traced back to a real person.


Here are some examples of simple techniques you can apply in MySQL:



Generalization



You can reduce the precision of data to lower its sensitivity. For example, instead of storing the full birthdate, you can store only the year of birth:


UPDATE test_users
SET birthdate = DATE_FORMAT(birthdate, '%Y-01-01');


Suppression



Sometimes the best approach is to suppress certain fields entirely, especially if they are not required for the test case:


UPDATE test_users
SET address = NULL;


Pseudonymization



You can replace identifiers (such as names) with random pseudonyms:


UPDATE test_users
SET name = CONCAT('User_', FLOOR(1 + (RAND() * 9999)));


  • Pseudonymization: Replacing direct identifiers with random pseudonyms. For instance, we can replaceUser_ID
    with a random or pseudonymous value.


UPDATE test_users
SET name = CONCAT('User_', FLOOR(1 + (RAND() * 9999)));


Next Steps: Advanced Data Masking Techniques



For more advanced use cases—such as masking email domains, preserving referential integrity across tables, or automating masking pipelines—dedicated data masking techniques are recommended.


If you’d like to explore more advanced methods, see our detailed guide: How to Mask Data in MySQL.