Kompass auf digitalem Code-Blau-Hintergrund.
Sashkin / stock.adobe.com
2025-09-18 expert contribution

Successfully navigating through data requirements for high-risk AI systems under the EU AI Act

The Regulation (EU) 2024/1689 (Artificial Intelligence Act, AIA) introduces a comprehensive legal framework for high-risk AI systems, aiming to ensure safety, transparency, and fundamental rights protection across sectors. At its core, the regulation emphasizes rigorous data governance by recognizing that the quality, integrity, and traceability of data are foundational to trustworthy AI. For developers and deployers of high-risk AI systems, the establishment of an optimal data management offers strategic levers for compliance and competitiveness.

Contact
AI Projects & Services

Data and data governance challenging AI system providers

Art. 10 AIA mandates that data used for training, validation, and testing must be:

  • Relevant, representative, free of errors, and complete.
  • Statistically sound and appropriate for the intended purpose.
  • Accompanied by documentation of collection methods, assumptions, and preprocessing steps.
  • Evaluated for potential biases and gaps, with mitigation strategies in place.

These provisions apply across all high-risk AI systems.

Annex IV AIA complements Art. 10 provisions by detailing what must be documented by deployers of high-risk AI systems:

  • Dataset descriptions, including origin, scope, and characteristics.
  • Data labeling and cleaning procedures.
  • Versioning and traceability across the data lifecycle.

This aligns with standards such as ISO/IEC 5259 (Data Quality for AI), ISO/IEC 8183 (AI Data Lifecycle), and ISO/IEC 42001 (AI Management Systems), which provide operational guidance for implementing robust data governance.

Bias mitigation is a cornerstone of the AI Act. Developers must proactively identify and address biases that could lead to wrong AI system outputs. This includes:

  • Using diverse and representative datasets.
  • Applying fairness-aware algorithms and validation metrics.
  • Documenting bias detection and mitigation techniques.

Data management process is key to AI Act compliance

The data management process in the quality management system of the AI system provider should encompass the following steps:

  • Data requirement specification,
  • Data management planning,
  • Data collection,
  • Data preparation,
  • Data provision, and
  • Data decommissioning.

Requirement specification and management planning typically occur within the AI model development process. The data management report serves as central proof of compliance.

Post-market monitoring and risk management

The AI Act requires continuous monitoring of deployed systems to detect performance degradation or data drift. This is especially critical for adaptive systems that evolve over time. Key practices include:

  • Logging inputs and outputs for traceability.
  • Monitoring prediction drift and triggering retraining or updates.
  • Using predetermined change control plans (PCCPs) for systems that learn post-deployment.

Risk management must also account for data-specific hazards such as poisoning, distributional shifts, and adversarial manipulation. Standard ISO/IEC 23894 offers a structured approach to managing these risks.

Data protection

When personal data is involved, the AI Act intersects with the EU GDPR. Developers must:

  • Assess whether data can be linked to individuals, even indirectly.
  • Apply principles of data minimization, purpose limitation, and fairness.
  • Ensure lawful bases for processing and implement privacy-preserving techniques.

This dual compliance challenge underscores the need for cross-functional collaboration between AI engineers, legal experts, and data protection officers.

Data access by Notified Bodies

Access to provider data by Notified Bodies (Art. 43 AI Act) presents a delicate intersection with GDPR compliance. To verify conformity of high-risk AI systems, Notified Bodies may require access to training, validation, and testing datasets possibly containing personal or pseudonymized data. Providers must ensure that such data sharing is explicitly covered by a legal basis, and that safeguards like anonymization, contractual controls, and audit trails are in place. Without these, the risk of unauthorized data access or secondary use could undermine both regulatory trust and data subject rights.

Additional considerations in data management

While the AI Act provides a strong foundation, the following critical aspects deserve additional attention:

  • Synthetic data: Increasingly used to augment or replace real datasets, synthetic data must be evaluated concerning quality, bias, and representativeness as well.
  • Data preparation: The role of domain experts in labeling, validating, and interpreting data remains irreplaceable. Human oversight processes enhance quality and accountability.

Conclusion

The EU AI Act sets a standard for data governance in high-risk AI systems. By embracing its requirements and integrating emerging standards, developers can build systems that are not only compliant but also resilient, ethical, and future-proof. Data is no longer just a technical asset but also a regulatory cornerstone and a competitive differentiator.

If you need help concerning the AI Act data requirements please contact us.

Get in touch with us!

Briefumschlaege als Icons, Netzwerkkonzept
thodonal / stock.adobe.com

We offer our services in consulting projects and in-house workshops and would be happy to provide you with more information about our services and answer any questions.

AI Projects & Services: aips@vde.com