Masking sensitive data is essential when handling structured files like CSV and JSON in development, QA, or analytics environments. These files often include personal or confidential information that shouldn't be exposed outside production.
In this article, you'll learn how to apply masking techniques using Python scripts—targeting values like names, emails, and card numbers—without compromising usability.
Masking Sensitive Data in CSV Files
CSV files are often used to move or prepare tabular data outside production. Masking sensitive information in CSVs requires modifying the content while keeping column headers, delimiters, and structure intact.
Common targets
- name, email, phone_number
- ssn, credit_card, iban
- address, zip_code, internal_id
Techniques
The following examples are implemented in Python scripts, using pandas and random. These can be adapted to various data structures and formats
Substitution
df['name'] = ['User_' + str(i) for i in df.index]
Truncation
df['credit_card'] = df['credit_card'].apply(lambda x: 'XXXX-XXXX-XXXX-' + x[-4:])
Randomization
df['zip_code'] = [random.randint(10000, 99999) for _ in df.index]
Full CSV masking script
import pandas as pd
import random
df = pd.read_csv("input.csv")
df['name'] = ['User_' + str(i) for i in df.index]
df['ssn'] = df['ssn'].apply(lambda x: "***-**-" + x[-4:])
df['zip_code'] = [random.randint(10000, 99999) for _ in df.index]
df.to_csv("masked_output.csv", index=False)
Always verify that:
- Headers and delimiters are preserved
- Masked file remains readable by your tools
- Output doesn’t break any downstream processing logic
Masking Sensitive Data in JSON Files
JSON is commonly used to store or transmit structured records, including nested data. Masking sensitive fields in JSON requires traversing keys and applying transformations without breaking the structure.
Typical keys to mask:
- email, ssn, card.number
- user_id, auth_token, address.zip
- Any nested field containing sensitive or financial data
Techniques
Email (randomized)
def mask_email(email):
return "user_" + str(random.randint(1000,9999)) + "@demo.com"
SSN (truncated)
record['ssn'] = "***-**-" + record['ssn'][-4:]
Token (nulling)
record['accessToken'] = None
Full JSON masking script
import json
import random
def mask_email(email):
return "user_" + str(random.randint(1000,9999)) + "@demo.com"
with open("input.json", "r") as f:
data = json.load(f)
for record in data:
if 'email' in record:
record['email'] = mask_email(record['email'])
if 'ssn' in record:
record['ssn'] = "***-**-" + record['ssn'][-4:]
if 'accessToken' in record:
record['accessToken'] = None
with open("masked_output.json", "w") as f:
json.dump(data, f, indent=2)
Ensure:
- The JSON structure remains valid
- No required fields are removed
- Output passes schema validation or test parsing