data classification

5 min read

Data Classification: Levels, Types, Policy & Process

Learn what data classification is, how to define levels and types, build an effective policy, and operationalize controls (masking/tokenization, access, evidence) across environments and CI/CD.

author-image

Sara Codarlupo

Marketing Specialist @Gigantics

Data classification is the governance mechanism that decides what a dataset is worth and what must happen to it—access, protection, retention, and audit evidence. It sits under data security and converts policy into enforceable controls across analytics, applications, and CI/CD.




What Is Data Classification?



Data classification is the process of assigning sensitivity labels to data—based on business impact and regulatory exposure—so that consistent controls can be applied across systems and environments. By turning policy into labels (e.g., Public, Internal, Confidential, Restricted), it enables uniform access decisions, appropriate protection (encryption, masking/tokenization), governed retention, and audit-ready evidence across all environments.




Why Data Classification Matters for Security and Compliance



Classification is the prerequisite for reliable controls. Labels drive IAM and key management, determine when to mask or tokenize, and govern what can be shared. For GDPR, HIPAA, and NIS2, classification yields defensible answers to where sensitive data lives, who can use it, how it is protected, and when it is removed—including copies used in testing, analytics, and backups.




Data Classification: Sensitivity Levels and Standard Taxonomy




Data Classification: Sensitivity Levels and Standard Taxonomy


Most corporate policies adopt a dual model that uses Levels (based on impact) and Types (based on policy label) for its application.


Sensitivity levels with definitions, examples, and baseline controls
Sensitivity Definition (impact if compromised) Typical examples Baseline controls (illustrative)
High Regulated or mission-critical; severe consequences if compromised. Bank account numbers, medical records (PHI), full PII sets, authentication secrets. Need-to-know RBAC/ABAC, strong encryption & key management, deterministic masking/tokenization in lower environments, strict retention, audit evidence.
Medium Internal use; moderate harm if accessed without authorization. Contract terms, internal reports, customer name/contact details (may be elevated under GDPR depending on context/combination). AuthN/Z, encryption at rest/in transit, DLP with logged access; protection in lower environments as policy requires.
Low Publicly available; minimal harm if exposed. Public websites, press releases, public maps. TLS and integrity checks; open/broad read as appropriate; basic monitoring.


Taxonomy Types (Policy and Usage Labels)



Policies generally converge on these four usage labels that are assigned to data and directly correlate with the levels above:


  • Restricted: Maximum sensitivity data that, if disclosed, would violate regulations or contracts with severe consequences (typically mapped to High Level).

  • Confidential: Operationally critical information shared only with internal, need-to-know groups. (mapped to Medium or High Level).

  • Internal Use Only: Non-public information intended to be shared among authorized employees. Damage from a leak would be moderate but requires basic safeguards. (mapped to Medium Level).

  • Public: Information intended for broad disclosure, carrying no risk if exposed. (mapped to Low Level).




Data Classification Policy: Scope, Criteria, and Controls per Level



A usable data classification policy is concise and explicit. It should define scope (systems, domains, and all environments—including ephemeral), the levels/types used (High/Medium/Low and/or Public, Internal, Confidential, Restricted) with clear criteria and examples, and the controls required per level (access, encryption, masking/tokenization, sharing, retention, logging). Specify ownership and RACI, exception/waiver rules (approver, duration, compensating controls), and the audit model—what is recorded, where, and how it is verified. Operationalize the policy with environment-aware enforcement (e.g., masking in lower environments), CI/CD policy checks, and per-release evidence mapped to GDPR/HIPAA/NIS2 obligations. Keep procedures (detector dictionaries, runbooks) in living docs.




Data Classification Process



Discovery & labeling



Run automated discovery to detect PII/PHI/PCI and tag assets; apply rules to map detections to High/Medium/Low labels; stewards adjudicate edge cases; propagate labels through lineage.



Enforcement



Use labels to drive RBAC/ABAC, apply deterministic masking/tokenization in lower environments for High/Medium as policy requires, encrypt everywhere, and redact/tokenize in exports, BI views, and APIs.



Evidence



Record label changes, policy evaluations, and control executions as tamper-evident logs attached to each release—shrinking audit timelines and enabling continuous compliance.




Data Classification Tools and Capabilities to Prioritize



  • High-quality detectors (PII/PHI/PCI) with custom patterns/dictionaries.

  • Deterministic masking/tokenization that preserves referential integrity across joins.

  • Rule engine: detections → High/Medium/Low → control actions.

  • APIs & CI/CD gates for automated checks before promotion.

  • Lineage & propagation across pipelines and sinks.

  • Audit-ready reports per release/domain/regulation.

  • Role-based access and clean segregation by environment.




How Gigantics Operationalizes Data Classification Across All Environments



Gigantics makes data classification actionable across production, staging, development, analytics, and backups by turning labels into concrete, repeatable actions—without adding process overhead.


  • Discovery & labeling: detect PII/PHI/PCI patterns and apply consistent labels (e.g., Public, Internal, Confidential, Restricted) to columns and datasets.

  • Label-driven protection: enforce deterministic masking/tokenization (with referential integrity preserved) for labeled data in lower environments; keep encryption/monitoring aligned to policy.

  • CI/CD checks: use API-based data gates to verify that assets are classified and required protections ran before promotion.

  • Evidence: generate audit artifacts that link classification results to the protections applied for each run or release.


Result: classification that is enforced the same way in every environment, with clear proof of what was protected and when.


Operationalize data classification with confidence.

Turn labels into consistent controls and evidence across environments—without slowing delivery. See how Gigantics enforces data classification through automated checks and audit-ready artifacts.

Schedule a demo


FAQs about Data Classification



1) What is data classification?



Data classification assigns sensitivity labels to data—based on business impact and regulatory exposure—so controls for access, protection, retention, and evidence are applied consistently across systems and environments.



2) What’s the difference between data classification levels and types?



Levels (High/Medium/Low) express relative sensitivity; types (Public, Internal, Confidential, Restricted) are a policy taxonomy. Many programs use both with a simple mapping (e.g., Restricted → High).



3) What should a data classification policy include?



Scope across all environments, level/type definitions with examples, required controls per level (access, encryption, masking/tokenization, retention, logging), ownership/RACI, exception handling, and an audit model tied to GDPR/HIPAA/NIS2 obligations.



4) How is data classification enforced in CI/CD and lower environments?



Pipelines run policy checks (“data gates”) that verify assets are classified and required protections were applied before promotion. For sensitive labels, deterministic masking/tokenization is enforced in non-production; encryption and monitoring are verified everywhere.



5) How often should data be reclassified, and who owns it?



Review classification on schema changes, data drift, or quarterly at minimum. Business owners and data stewards own labels; security/compliance define policy and verify evidence, with engineering automating checks and propagation.