EU AI Act Article 10: Evidence Architecture for Training Datasets

The EU AI Act's high-risk AI systems framework starts applying in August 2026. For regulated sectors, the most operationally demanding requirement is straightforward but hard to meet: being able to reconstruct, precisely and on demand, the exact composition of the dataset that trained each model in production.

Article 10 sets specific documentation obligations for datasets used to train, validate, and test high-risk systems. The gap between what organizations typically document and what the regulation will require demonstrating comes down to architecture, not policy.

This analysis covers three things: what Article 10 requires to be reproducible, what a technically sufficient audit response looks like, and what architectural conditions get you there.

What EU AI Act Article 10 Actually Requires

Regulation (EU) 2024/1689 requires providers of high-risk AI systems to apply appropriate data governance and management practices to training, validation, and test datasets. At minimum, those practices must cover:

The choice of dataset design

Data collection processes

Preparation operations: labelling, cleaning, enrichment, aggregation

Assumptions about what the dataset represents

Prior assessment of availability, quantity, and suitability

Examination for biases that could affect health, safety, or produce discrimination

Datasets must also be relevant, sufficiently representative, free of errors, and complete for the intended purpose — accounting for the specific geographical, contextual, behavioural, or functional environment where the system will be used.

One thing the regulation makes clear: EU AI Act Article 10 data governance requirements will be evaluated with reproducible technical evidence. A well-written policy document, on its own, is not enough.

The Operational Challenge: Reconstructing Dataset Composition

In practice, training datasets are usually built through:

Extractions from production databases into non-production environments
Manual or semi-automated transformations (anonymization, masking, aggregation)
Consolidation into training, validation, and test pipelines
Successive dataset versions accumulated as the model evolves

The EU AI Act explicitly prioritizes field-level provenance traceability — knowing the origin of each field, what transformations were applied, and when. Most current frameworks produce datasets that work for training but whose exact composition isn't recorded in a reproducible way.

What an Audit Query Actually Looks Like

The clearest way to understand what Article 10 demands is to look at a concrete question a supervisory authority might ask:

"For the fraud detection model deployed in production since March 12th, state how many records containing personal data from non-EU citizens entered the training dataset. Detail what anonymization transformations were applied to each field and under which version of the data governance policy in effect at the time."

Two architectures produce very different responses.

Architecture A — Parallel Documentation

Written policies, spreadsheet records, pipeline descriptions in design documents. Responding means convening the data team, recovering the pipeline version, reconstructing the transformations. Estimated time: 3 to 6 weeks. Result: partial, with uncertainty about accuracy.

Architecture B — Reproducible Evidence

The system records, for each sensitive data point, its origin (tap, table, column), the transformations applied (rule, parameters, policy version), and its destination (sink, model, environment). The response is a direct query to the lineage register. Estimated time: minutes. Result: complete and verifiable.

The difference isn't documentary quality. It's structural: whether field-level traceability is built into how the system works, or requires manual reconstruction every time.

Three Technical Conditions for EU AI Act Training Data Compliance

Three capabilities, together, make it possible to produce the evidence Article 10 requires without adding overhead to the model development cycle. None of them are new — what's new is that auditors will now ask for them.

1. Field-Level Lineage

For each sensitive data point entering a training, validation, or test dataset, the system records:

Origin: tap, table, column, record identifier

Transformations applied: anonymization, masking, synthesis, aggregation — with parameters and rule version

Destination: sink, dataset version, model, environment

This record must outlast the pipeline that generated it. If the pipeline is deleted or rewritten, the lineage data still needs to answer retrospective queries. The standard approach is to separate the lineage layer from the execution layer, with auditable and immutable event storage.

2. Versioned Policy as Code

Governance rules — which fields are sensitive, which transformation applies, which exceptions exist — need to live in versionable artefacts, not documents. When an auditor asks what policy governed a transformation fourteen months ago, you need to be able to retrieve the exact version, show who changed it and when, and reproduce its behavior on a specific data point from that period.

In practice: policy artefacts stored in Git, versioned with semantic tags, integrated into the governance CI/CD pipeline, with a register linking each operation to the policy version active at the time.

3. Origin-Based Execution

Sensitive data transformation happens before the dataset leaves the production perimeter — not as a later step on already-extracted copies. This matters especially where the EU AI Act intersects with NIS2 on the ICT supply chain. Non-production environments with real, uncontrolled data fall within the regulatory perimeter.

Origin-based execution closes that gap and directly addresses EU AI Act data quality requirements.

EU AI Act, NIS2, and DORA: One Architecture, Three Regulations

The three European frameworks with the most operational impact on non-production data — EU AI Act, NIS2, and DORA — share a common technical core: auditable evidence of how sensitive data is handled across the chain.

Regulation	What it requires for non-production data
EU AI Act — Art. 10	Exact composition and field-level traceability of training, validation, and test datasets for high-risk AI systems.
NIS2 — Art. 20	Controls equivalent to production environments across the ICT supply chain, including non-production environments holding real data.
DORA — Art. 28	Traceability and continuous oversight of ICT third parties that process data, with auditable evidence of data flows.

All three converge on the same technical conditions: field-level lineage, versioned policy, and origin-based execution. An organization that builds evidence architecture for the EU AI Act is, at the same time, building the foundation for NIS2 and DORA compliance.

A Twelve-Month Plan

Enforcement starts in August 2026. Twelve months is not enough to rewrite training pipelines or manually reconstruct traceability for models already in production. It is enough to build the right architecture into every pipeline from that point forward.

Months 1–3: Inventory and Mapping

Identify AI models classifiable as high-risk under Annex III. Map the training, validation, and test datasets feeding each one. Document existing traceability and identify field-level gaps.

Months 4–9: Architectural Implementation

Deploy the three technical conditions in new training pipelines: policy as code in Git, origin-based execution in the non-production ETL, field-level lineage register with auditable storage.

Months 10–12: Audit Readiness

Generate reproducible evidence reports for already-deployed models using existing tooling. Define an internal playbook for supervisory authority queries. Validate through internal audit or external review.

At twelve months, new pipelines have evidence architecture built in and legacy models have a documented response protocol. That's the position you want before the first audit.

Conclusion

EU AI Act Article 10 changes something specific: the exact composition of a dataset that trains a high-risk model must be reconstructable on demand, through evidence the system itself produces during execution. The obligation to document was already there. What changes is that documentation now has to be technically reproducible, not just written down.

Whether an organization is ready for that audit depends on one thing: whether field-level traceability is part of how the system works, or something that has to be rebuilt from scratch every time someone asks.

EU AI Act Article 10 and Data Governance: Evidence Architecture for Training Datasets