Data and Governance: building trustworthy AI in Europe
Data quality and governance are the foundations on which any trustworthy Artificial Intelligence system is built. Sophisticated algorithms alone are not enough: incomplete, biased or poorly governed data lead to erroneous, unfair or discriminatory decisions.
The European Artificial Intelligence Act sets out in Article 10 how data must be managed in high-risk AI systems. In line with this framework, the Spanish Agency for the Supervision of Artificial Intelligence (AESIA) has published 16 practical guidelines to support the implementation of the European Artificial Intelligence Act. Guideline 7, focused on data and governance, serves as a roadmap for building robust and trustworthy AI systems.
Within the Smart Data Space Innovation Centre of Zamora, these recommendations provide concrete guidance for applying data governance in collaborative environments, where companies, public administrations and research centres share data in a secure and controlled way.
1. Data governance: more than a legal requirement
Data governance encompasses the policies, processes and procedures that ensure that data used in AI systems are:
- Fit for their intended purpose
- Representative and complete
- Of proven quality
- Processed in full respect of fundamental rights
In practice, this reduces risks to safety, health and individual rights, while fostering trust among all participants in the data space.
2. The complete data life cycle
AESIA proposes an integrated approach that covers all stages of the data life cycle, from the definition of information needs to responsible deletion.
2.1 Information requirements
The first step is to identify which data the AI system needs to fulfil its objective. For example, a healthcare system managing insulin pumps requires information on blood glucose levels, heart rate and blood oxygen levels. Defining these requirements correctly avoids unnecessary data collection and helps prevent bias from the outset.
2.2 Strategic data collection
Using diverse data sources reduces bias and improves representativeness. A facial recognition system trained on a single demographic group may perpetuate discrimination.
Data can be collected through various methods—such as web scraping, IoT sensors, crowdsourcing or system-to-system transactions—always in compliance with data protection regulations.
2.3 Data preparation and quality
AESIA highlights the need to ensure data quality through:
- Controls on key dimensions such as completeness, consistency, relevance and representativeness
- Remediation plans to address detected issues
The guide also addresses:
- Data transformation and harmonisation
- Variable aggregation and strategic sampling
- Feature creation, selection and labelling
All these processes are essential to ensure that data are suitable for training trustworthy AI models.
2.4 Bias analysis
Bias is not only a technical issue but also an ethical and legal one. AESIA recommends:
- Identifying sources of bias
- Assessing their impact
- Applying corrective measures appropriate to the context
Within the Smart Data Space Innovation Centre of Zamora, teams can analyse historical and current data to prevent discrimination in automated decision-making.
2.5 Data availability and documentation
Prepared data must be made available securely, with proper version control and comprehensive documentation. Traceability enables auditing across the entire data life cycle and supports accountability.
2.6 Responsible data deletion
Finally, data must be deleted securely, in compliance with legal retention obligations, GDPR rights and any relevant social or cultural considerations.
3. Processing special categories of data
For sensitive data—such as data relating to ethnic origin, health, biometrics or sexual orientation—AESIA establishes strict safeguards: data minimisation, anonymisation or pseudonymisation, access controls and detailed documentation.
4. Implications for Data Spaces
In collaborative environments such as Zamora:
- Multi-level governance is required, with clearly defined roles and responsibilities
- End-to-end traceability ensures auditability
- Federated data quality and collaborative bias analysis are essential
- Enhanced data protection is achieved through robust technical architectures
5. Practical recommendations for organisations
- Apply the full data governance life cycle
- Invest in data quality from the collection phase
- Document every step to ensure traceability and auditability
- Assign clear responsibilities and build multidisciplinary teams
- Maintain continuous monitoring and leverage existing anonymisation tools
Data governance is not a barrier to innovation, but the foundation of trust and sustainability in AI. At the Smart Data Space Innovation Centre of Zamora, implementing these practices enables companies, institutions and startups to collaborate securely and unlock the full potential of data in a responsible way.
To access the full guide and other resources developed by AESIA, visit the official website of the Spanish Agency for the Supervision of Artificial Intelligence.