maskingsensitive datafilesjsoncsv

3 min read

How to mask sensitive data in files: From CSV to JSON

Learn how to mask sensitive data in CSV and JSON files using Python scripts. Apply substitution, truncation, and randomization to protect PII and other sensitive fields in development workflows.

author-image

Sara Codarlupo

Marketing Specialist @Gigantics

Masking sensitive data is essential when handling structured files like CSV and JSON in development, QA, or analytics environments. These files often include personal or confidential information that shouldn't be exposed outside production.



In this article, you'll learn how to apply masking techniques using Python scripts—targeting values like names, emails, and card numbers—without compromising usability.



Masking Sensitive Data in CSV Files


CSV files are often used to move or prepare tabular data outside production. Masking sensitive information in CSVs requires modifying the content while keeping column headers, delimiters, and structure intact.



Common targets


  • name, email, phone_number

  • ssn, credit_card, iban

  • address, zip_code, internal_id



Techniques



The following examples are implemented in Python scripts, using pandas and random. These can be adapted to various data structures and formats


Substitution

df['name'] = ['User_' + str(i) for i in df.index]


Truncation

df['credit_card'] = df['credit_card'].apply(lambda x: 'XXXX-XXXX-XXXX-' + x[-4:])


Randomization

df['zip_code'] = [random.randint(10000, 99999) for _ in df.index]



Full CSV masking script



import pandas as pd
import random

df = pd.read_csv("input.csv")

df['name'] = ['User_' + str(i) for i in df.index]
df['ssn'] = df['ssn'].apply(lambda x: "***-**-" + x[-4:])
df['zip_code'] = [random.randint(10000, 99999) for _ in df.index]

df.to_csv("masked_output.csv", index=False)


Always verify that:


  • Headers and delimiters are preserved

  • Masked file remains readable by your tools

  • Output doesn’t break any downstream processing logic




Masking Sensitive Data in JSON Files



JSON is commonly used to store or transmit structured records, including nested data. Masking sensitive fields in JSON requires traversing keys and applying transformations without breaking the structure.



Typical keys to mask:


  • email, ssn, card.number

  • user_id, auth_token, address.zip

  • Any nested field containing sensitive or financial data


Techniques



Email (randomized)

def mask_email(email):
    return "user_" + str(random.randint(1000,9999)) + "@demo.com"


SSN (truncated)

record['ssn'] = "***-**-" + record['ssn'][-4:]


Token (nulling)

record['accessToken'] = None



Full JSON masking script



import json
import random

def mask_email(email):
    return "user_" + str(random.randint(1000,9999)) + "@demo.com"

with open("input.json", "r") as f:
    data = json.load(f)

for record in data:
    if 'email' in record:
        record['email'] = mask_email(record['email'])
    if 'ssn' in record:
        record['ssn'] = "***-**-" + record['ssn'][-4:]
    if 'accessToken' in record:
        record['accessToken'] = None

with open("masked_output.json", "w") as f:
    json.dump(data, f, indent=2)


Ensure:


  • The JSON structure remains valid

  • No required fields are removed

  • Output passes schema validation or test parsing



Masking Techniques by Field


Field TypeTechniqueExample
NameSubstitution
Alice

User_0281

EmailRandomization
a.smith@example.com

user3827@demo.com

SSNTruncation
123-45-6789

***-**-6789

Credit CardTruncation
4111-1111-1111-4321

XXXX-XXXX-XXXX-4321

ZIP CodeRandomization
90210

75894

TokenNulling
"accessToken"

null


File Masking in Development Pipelines



Scripts like the ones above are typically used in:


  • Data preparation workflows for test automation

  • Generating sanitized file dumps for external teams

  • Creating realistic input payloads for API testing

  • Delivering sample datasets for validation or demos


By applying file masking early in the process, teams can protect sensitive data for development and QA use—without losing test fidelity.




Automate Sensitive Data Masking Across Your Organization



Gigantics helps security, engineering, QA, and data teams detect and mask sensitive information across environments—applying consistent, automated rules to ensure data privacy and compliance.


Whether your data is used for development, testing, analytics, or collaboration, Gigantics delivers format-preserving masking that adapts to your workflows and protects information from unnecessary exposure.


👉Book a Demo with Gigantics
Learn how to automate sensitive data masking—from source to output—while keeping your pipelines efficient and secure.