masking files json csv

3 min read

Mask Sensitive Data in Files: CSV to JSON

Learn how to mask sensitive data in CSV and JSON files using Python. Protect PII in Dev, QA, and Analytics environments with format-preserving techniques.

author-image

Sara Codarlupo

Marketing Specialist @Gigantics

Masking sensitive data is essential whenever you move structured files like CSV and JSON outside of secure production environments. Whether it’s for Development, QA, Analytics, or sharing datasets with third-party partners, these files often contain Personal Identifiable Information (PII) that, if left unprotected, exposes your company to massive compliance risks (GDPR, SOC2) and security breaches.



In this guide, you’ll learn how to apply format-preserving masking techniques using Python—ensuring your data remains functional for testing and analysis while keeping sensitive values like names, emails, and credit cards 100% anonymous.



Masking Sensitive Data in CSV Files


CSV files are the standard for data portability. However, they are often handled in local machines or shared via Slack/Email for troubleshooting, which are highly exposed environments. Masking must modify the content without breaking the column structure or delimiters.



Common targets


  • name, email, phone_number

  • ssn, credit_card, iban

  • address, zip_code, internal_id



Techniques


The following examples are implemented in Python scripts, using pandas and random. These can be adapted to various data structures and formats


Substitution

df['name'] = ['User_' + str(i) for i in df.index]


Truncation

df['credit_card'] = df['credit_card'].apply(lambda x: 'XXXX-XXXX-XXXX-' + x[-4:])


Randomization

df['zip_code'] = [random.randint(10000, 99999) for _ in df.index]



Full CSV masking script



import pandas as pd
import random

df = pd.read_csv("input.csv")

df['name'] = ['User_' + str(i) for i in df.index]
df['ssn'] = df['ssn'].apply(lambda x: "***-**-" + x[-4:])
df['zip_code'] = [random.randint(10000, 99999) for _ in df.index]

df.to_csv("masked_output.csv", index=False)


Always verify that:


  • Headers and delimiters are preserved

  • Masked file remains readable by your tools

  • Output doesn’t break any downstream processing logic




Masking Sensitive Data in JSON Files



JSON is the backbone of modern APIs and NoSQL databases. Masking nested JSON structures is critical for Staging environments and API Integration tests, where developers need realistic payloads without seeing real customer tokens or secrets.



Typical keys to mask:


  • email, ssn, card.number

  • user_id, auth_token, address.zip

  • Any nested field containing sensitive or financial data


Techniques



Email (randomized)

def mask_email(email):
    return "user_" + str(random.randint(1000,9999)) + "@demo.com"


SSN (truncated)

record['ssn'] = "***-**-" + record['ssn'][-4:]


Token (nulling)

record['accessToken'] = None



Full JSON masking script


import json
import random

def mask_email(email):
    return "user_" + str(random.randint(1000,9999)) + "@demo.com"

with open("input.json", "r") as f:
    data = json.load(f)

for record in data:
    if 'email' in record:
        record['email'] = mask_email(record['email'])
    if 'ssn' in record:
        record['ssn'] = "***-**-" + record['ssn'][-4:]
    if 'accessToken' in record:
        record['accessToken'] = None

with open("masked_output.json", "w") as f:
    json.dump(data, f, indent=2)


Ensure:


  • The JSON structure remains valid

  • No required fields are removed

  • Output passes schema validation or test parsing



Masking Techniques by Field


Field TypeTechniqueExample
NameSubstitution
Alice

User_0281

EmailRandomization
a.smith@example.com

user3827@demo.com

SSNTruncation
123-45-6789

***-**-6789

Credit CardTruncation
4111-1111-1111-4321

XXXX-XXXX-XXXX-4321

ZIP CodeRandomization
90210

75894

TokenNulling
"accessToken"

null



File Masking in Development Pipelines



Scripts like the ones above are typically used in:


  • Data preparation workflows for test automation

  • Generating sanitized file dumps for external teams

  • Creating realistic input payloads for API testing

  • Delivering sample datasets for validation or demos


By applying file masking early in the process, teams can protect sensitive data for development and QA use—without losing test fidelity.




Automate Sensitive Data Masking Across Your Organization



Gigantics helps engineering, security, and data teams detect and mask sensitive information across all non-production environments. Instead of maintaining hundreds of custom scripts, Gigantics provides an automated, format-preserving solution that ensures compliance with zero manual effort.


Whether your files are for testing, analytics, or external demos, protect your data from unnecessary exposure today.


👉Book a Demo with Gigantics and move from manual scripts to enterprise-grade data privacy.