On April 28th, we participated in the seventh edition of the Cybersecurity & Data Innovation Summit held in Madrid. The event brought together over three hundred security directors, architects, and data protection officers, leading to a conclusion that no longer allows for nuance: regulatory pressure on non-production environments has shifted from an internal conversation between security teams to a matter involving contractual deadlines and personal liability.
Our CTO, Ricardo Martínez, presented a technical use case titled "Data Anonymization in Multi-Technology Environments." The stance we defended on stage and in the exhibition area can be summarized in seven words: if the data is real, the risk is real.
The Structural Problem of Non-Production Environments
Medium and large organizations maintain an average of nearly thirty full copies of production databases across their development, QA, staging, and model training environments. Every one of those copies contains real personal data. Every one inherits the regulatory burden of the system it originates from. None inherit equivalent security controls.
For years, the technical argument justifying this practice was operational. Traditional masking tools broke relational schemas. Tests failed to pass with transformed data. Development teams resorted to direct production copies to maintain delivery speed, leaving anonymization as a policy declared in documents rather than a system behavior.
What the sector now acknowledges, both publicly and privately, is that this trade-off is no longer defensible. This is due to regulatory pressure, which has shifted ultimate responsibility to boards of directors, and technical progress, which has proven that architectures capable of anonymizing without breaking the functional utility of the data now exist.
NIS2, DORA, and the AI Act: The Triangle That Dominated the Event
The session preceding Gigantics' technical presentation was the debate "Executive Responsibility and the Regulatory Tsunami." A panel of legal experts and consultants analyzed the overlap of NIS2, DORA, the AI Act, and the Cyber Resilience Act on essential Spanish entities and financial institutions. The operational implications discussed are direct and have specific deadlines.
NIS2 demands technical and organizational measures throughout the entire data chain, including non-production environments. While the Spanish transposition is still pending, European partners are already requesting evidence of compliance as a contractual condition. Article 20 of the Directive shifts final responsibility to management bodies, including the possibility of temporary bans for proven negligence.
DORA introduces a nuance that many financial entities are still operationalizing. The operational resilience tests required by the regulation are, in practice, executed on environments containing real customer data. Each of these environments extends the entity's regulatory perimeter to every ICT provider with access to them.
The AI Act introduces documentation obligations regarding the datasets used to train models classified as high-risk. When an auditor requests the exact composition of a training dataset for a production model, most organizations will not be in a position to respond accurately.
These three regulations converge on the same technical point: sensitive data must not exist without control outside of production environments. This is a concept already established by the GDPR’s minimization principle (Article 5.1.c), now reinforced through the lens of supply chain security and operational resilience.
Why the Adjective "Multi-Technology" Matters
A decade ago, the debate over anonymization assumed a single relational database engine. That hypothesis is no longer valid. Organizations now operate combinations of PostgreSQL, MongoDB, Oracle, SQL Server, Snowflake, Elasticsearch and, increasingly, vector databases for AI use cases. Every engine has its own data model, identifiers, and way of representing relationships.
Traditional masking tools were designed before this complexity. They apply field-by-field transformations without knowledge of the full schema or cross-system relationships. The result is predictable: identifiers that no longer match between SQL and NoSQL databases, orphaned foreign keys, and application validations that reject transformed data.
The operational consequence is equally predictable: when tests fail with masked data, development teams revert to direct production copies. A compliance tool that nobody uses becomes, in terms of risk, equivalent to having no tool at all.
Cross-database referential integrity is not an "add-on" feature for an anonymization engine; it is the architectural condition that determines whether anonymization is sustained over time or abandoned by the second quarter.
If the Data is Real, the Risk is Real
Real data outside of production is no less sensitive simply because it is located in an environment labeled "non-critical." It holds the exact same value for an attacker. It carries the exact same regulatory burden under GDPR. And, in many cases, it is subject to security controls inferior to those of the production system it came from.
Security breaches over the last twelve months show a consistent pattern. Attackers no longer need to reach production; they only need to find a development environment containing an operational copy of production data with weaker access protections. The Vercel incident last April, originating from a third-party AI tool that accessed environment variables with real credentials, was one of the examples discussed during the event.
The technical conclusion is straightforward: if non-production environments contain data that is functionally equivalent to production but holds no real value, an attacker will find nothing worth extracting. The regulatory burden on those environments is reduced to the technical nature of the data they contain, and the board's due diligence gains a concrete, demonstrable operational translation.
What the Industry Confirms
Conversations held during the summit confirmed trends we have been detecting with clients across Spain and Europe:
- Banking and insurance are operationalizing DORA in test environments. CISOs consulted during the event share a specific concern: the operational resilience tests required by DORA are mostly executed on environments containing real customer data. Reducing this perimeter through anonymization with referential integrity is no longer an operational optimization—it is a compliance requirement.
- AI pipelines are entering the audit zone. Platform leads from various tech organizations independently mentioned the same problem: the pressure to iterate quickly on Generative AI use cases clashes with the impossibility of proving to an AI Act auditor what personal data entered the training dataset. The discovery and anonymization layer applied before the training pipeline is moving from optional to mandatory.
- Data Architects demand consistency, not more tools. Data architects raised the most operational concern: traditional anonymization breaks their schemas, while pure synthetic data loses statistical distribution and limits the ability to detect real bugs. An approach that combines deterministic anonymization with referential preservation is, for most, the only technically sustainable path.
Anonymization as a System Property
Data anonymization in non-production environments has been discussed in regulatory reports and academic papers for years. What we advocated for at the Cybersecurity & Data Innovation Summit is that translating this theoretical requirement into executable, auditable technical behavior is now an architectural decision within reach of any organization with a heterogeneous stack.
Cross-database referential integrity, policy-as-code versioning, and local-first execution are no longer just technical capabilities—they are conditions for compliance. Without these three, regulatory control over non-production environments relies on documentation; with them, it relies on the behavior of the system itself.

