data classification

5 min read

Data Classification: Levels, Types, Policy & Process

Learn what data classification is, how to define levels and types, build an effective policy, and operationalize controls (masking/tokenization, access, evidence) across environments and CI/CD.

author-image

Sara Codarlupo

Marketing Specialist @Gigantics

Data classification is the governance mechanism that decides what a dataset is worth and what must happen to it—access, protection, retention, and audit evidence. This makes it the prerequisite for reliable data security controls and the foundation for converting abstract policy into enforceable rules across analytics, applications, and CI/CD.




What Is Data Classification and Why It Matters



Data classification is the process of assigning sensitivity labels to data—based on business impact and regulatory exposure—so that consistent controls can be applied across systems and environments.


By turning policy into labels (e.g., Public, Internal, Confidential, Restricted), the process enables:


  • Precise Control Assignment: Labels drive IAM, determine when to mask or tokenize, and govern what can be shared, allowing organizations to direct security resources toward the most sensitive data.

  • Compliance and Audit Readiness: For GDPR, HIPAA, and NIS2, classification yields defensible answers to where sensitive data lives, how it is protected, and when it is removed, providing audit-ready evidence across all environments.




Data Classification: Sensitivity Levels and Standard Taxonomy



Sensitivity Levels (Based on Impact)


Most corporate policies adopt a dual model that uses Levels (based on impact) and Types (based on policy labels) for enforcement.


Sensitivity levels with definitions, examples, and baseline controls
Sensitivity Definition (impact if compromised) Typical examples Baseline controls (illustrative)
High Regulated or mission-critical; severe consequences if compromised. Bank account numbers, medical records (PHI), full PII sets, authentication secrets. Need-to-know RBAC/ABAC, strong encryption & key management, deterministic masking/tokenization in lower environments, strict retention, audit evidence.
Medium Internal use; moderate harm if accessed without authorization. Contract terms, internal reports, customer name/contact details (may be elevated under GDPR depending on context/combination). AuthN/Z, encryption at rest/in transit, DLP with logged access; protection in lower environments as policy requires.
Low Publicly available; minimal harm if exposed. Public websites, press releases, public maps. TLS and integrity checks; open/broad read as appropriate; basic monitoring.


Taxonomy Types (Policy and Usage Labels)


Policies generally converge on these four usage labels that are assigned to data and directly correlate with the levels above:


  • Restricted: Maximum sensitivity data that, if disclosed, would violate regulations or contracts with severe consequences (typically mapped to High Level).

  • Confidential: Operationally critical information shared only with internal, need-to-know groups. (mapped to Medium or High Level).

  • Internal Use Only: Non-public information intended to be shared among authorized employees. Damage from a leak would be moderate but requires basic safeguards. (mapped to Medium Level).

  • Public: Information intended for broad disclosure, carrying no risk if exposed. (mapped to Low Level).



Data Classification Policy: Scope, Criteria, and Controls per Level



A usable data classification policy is concise and explicit. It must define scope (systems, domains, and all environments), the levels/types used with clear criteria and examples, and the controls required per level (access, encryption, masking/tokenization, retention, logging). Specify ownership and RACI, exception/waiver rules, and the audit model (what is recorded and how it is verified).




Data Classification Process


The effective process is continuous, systematic, and requires automation in three core stages:



1. Discovery & Labeling



Run automated discovery to detect sensitive data patterns (PII/PHI/PCI) and tag assets. Apply rules to map these detections to High/Medium/Low labels; data stewards adjudicate edge cases; and propagate the labels through data lineage to maintain consistency across all data copies.



2. Policy Enforcement



Use the assigned labels to automatically drive and enforce controls. This includes implementing Role-Based Access Control (RBAC), defining the necessity of data protection controls (like encryption or tokenization), and ensuring data is handled according to its sensitivity level across all environments.



3. Compliance and Audit Readiness



Record all label changes, policy evaluations, and control executions as tamper-evident logs. This documentation serves as the official audit evidence for every data asset, streamlining compliance checks and enabling a model of continuous, demonstrable protection.




Data Classification Tools and Capabilities to Prioritize



  • High-quality detectors (PII/PHI/PCI) with custom patterns/dictionaries.

  • Deterministic masking/tokenization that preserves referential integrity across joins.

  • Rule engine: detections → High/Medium/Low → control actions.

  • APIs & CI/CD gates for automated checks before promotion.

  • Lineage & propagation across pipelines and sinks.

  • Audit-ready reports per release/domain/regulation.

  • Role-based access and clean segregation by environment.




How Gigantics Operationalizes Data Classification Across All Environments



Gigantics turns classification labels into executable, verifiable controls:


  • Discovery & labeling: detect PII/PHI/PCI patterns and apply consistent labels (e.g., Public, Internal, Confidential, Restricted) to columns and datasets.

  • Label-driven protection: enforce deterministic masking/tokenization (with referential integrity preserved) for labeled data in lower environments; keep encryption/monitoring aligned to policy.

  • CI/CD checks: use API-based data gates to verify that assets are classified and required protections ran before promotion.

  • Evidence: generate audit artifacts that link classification results to the protections applied for each run or release.


Result: classification that is enforced the same way in every environment, with clear proof of what was protected and when.


Operationalize data classification with confidence.

Turn labels into consistent controls and evidence across environments—without slowing delivery. See how Gigantics enforces data classification through automated checks and audit-ready artifacts.

Schedule a demo


FAQs about Data Classification



1) What is data classification?



Data classification assigns sensitivity labels to data—based on business impact and regulatory exposure—so controls for access, protection, retention, and evidence are applied consistently across systems and environments.



2) What’s the difference between data classification levels and types?



Levels (High/Medium/Low) express relative sensitivity; types (Public, Internal, Confidential, Restricted) are a policy taxonomy. Many programs use both with a simple mapping (e.g., Restricted → High).



3) What should a data classification policy include?



Scope across all environments, level/type definitions with examples, required controls per level (access, encryption, masking/tokenization, retention, logging), ownership/RACI, exception handling, and an audit model tied to GDPR/HIPAA/NIS2 obligations.



4) How is data classification enforced in CI/CD and lower environments?



Pipelines run policy checks (“data gates”) that verify assets are classified and required protections were applied before promotion. For sensitive labels, deterministic masking/tokenization is enforced in non-production; encryption and monitoring are verified everywhere.



5) How often should data be reclassified, and who owns it?



Review classification on schema changes, data drift, or quarterly at minimum. Business owners and data stewards own labels; security/compliance define policy and verify evidence, with engineering automating checks and propagation.