How to Mask Sensitive Data in Files: CSV to JSON (Guide)

Masking sensitive data is essential when handling structured files like CSV and JSON in development, QA, or analytics environments. These files often include personal or confidential information that shouldn't be exposed outside production.

In this article, you'll learn how to apply masking techniques using Python scripts—targeting values like names, emails, and card numbers—without compromising usability.

Masking Sensitive Data in CSV Files

CSV files are often used to move or prepare tabular data outside production. Masking sensitive information in CSVs requires modifying the content while keeping column headers, delimiters, and structure intact.

Common targets

name, email, phone_number

ssn, credit_card, iban

address, zip_code, internal_id

Techniques

The following examples are implemented in Python scripts, using pandas and random. These can be adapted to various data structures and formats

Substitution

df['name'] = ['User_' + str(i) for i in df.index]

Truncation

df['credit_card'] = df['credit_card'].apply(lambda x: 'XXXX-XXXX-XXXX-' + x[-4:])

Randomization

df['zip_code'] = [random.randint(10000, 99999) for _ in df.index]

Full CSV masking script

import pandas as pd
import random

df = pd.read_csv("input.csv")

df['name'] = ['User_' + str(i) for i in df.index]
df['ssn'] = df['ssn'].apply(lambda x: "***-**-" + x[-4:])
df['zip_code'] = [random.randint(10000, 99999) for _ in df.index]

df.to_csv("masked_output.csv", index=False)

Always verify that:

Headers and delimiters are preserved

Masked file remains readable by your tools

Output doesn’t break any downstream processing logic

Masking Sensitive Data in JSON Files

JSON is commonly used to store or transmit structured records, including nested data. Masking sensitive fields in JSON requires traversing keys and applying transformations without breaking the structure.

Typical keys to mask:

email, ssn, card.number

user_id, auth_token, address.zip

Any nested field containing sensitive or financial data

Techniques

Email (randomized)

def mask_email(email):
    return "user_" + str(random.randint(1000,9999)) + "@demo.com"

SSN (truncated)

record['ssn'] = "***-**-" + record['ssn'][-4:]

Token (nulling)

record['accessToken'] = None

Full JSON masking script

import json
import random

def mask_email(email):
    return "user_" + str(random.randint(1000,9999)) + "@demo.com"

with open("input.json", "r") as f:
    data = json.load(f)

for record in data:
    if 'email' in record:
        record['email'] = mask_email(record['email'])
    if 'ssn' in record:
        record['ssn'] = "***-**-" + record['ssn'][-4:]
    if 'accessToken' in record:
        record['accessToken'] = None

with open("masked_output.json", "w") as f:
    json.dump(data, f, indent=2)

Ensure:

The JSON structure remains valid

No required fields are removed

Output passes schema validation or test parsing

Masking Techniques by Field

Field Type	Technique	Example
Name	Substitution	`Alice` → `User_0281`
Email	Randomization	`a.smith@example.com` → `user3827@demo.com`
SSN	Truncation	`123-45-6789` → `*--6789`
Credit Card	Truncation	`4111-1111-1111-4321` → `XXXX-XXXX-XXXX-4321`
ZIP Code	Randomization	`90210` → `75894`
Token	Nulling	`"accessToken"` → `null`

File Masking in Development Pipelines

Scripts like the ones above are typically used in:

Data preparation workflows for test automation

Generating sanitized file dumps for external teams

Creating realistic input payloads for API testing

Delivering sample datasets for validation or demos

By applying file masking early in the process, teams can protect sensitive data for development and QA use—without losing test fidelity.

Automate Sensitive Data Masking Across Your Organization

Gigantics helps security, engineering, QA, and data teams detect and mask sensitive information across environments—applying consistent, automated rules to ensure data privacy and compliance.

Whether your data is used for development, testing, analytics, or collaboration, Gigantics delivers format-preserving masking that adapts to your workflows and protects information from unnecessary exposure.

👉Book a Demo with Gigantics
Learn how to automate sensitive data masking—from source to output—while keeping your pipelines efficient and secure.