Training artificial intelligence without sharing data: federated learning arrives in data spaces
There is a fundamental tension in the data economy: training powerful artificial intelligence models requires large volumes of data, but organizations cannot (and should not) freely share their sensitive data. Federated learning resolves this tension: it enables collaborative AI model training across multiple organizations without data ever leaving each participant's infrastructure.
The data dilemma in enterprise AI
A hospital wants to improve its early disease detection model but only has data from its own patients. If it could combine its data with that of 50 other hospitals, the model would be significantly more accurate. However, sharing medical records between institutions presents enormous legal (GDPR), ethical, and technical barriers.
This same dilemma repeats in finance (fraud detection), manufacturing (predictive maintenance), agriculture (crop optimization), and virtually any sector where data is sensitive but the value of aggregating it would be immense.
What is federated learning?
Federated learning inverts the classical machine learning paradigm. Instead of centralizing data to train a model, it distributes the model to where the data resides. The process works as follows:
Distribution. Each participant receives a copy of the base model.
Local training. Each participant trains the model locally with their own data.
Update sharing. Only model updates (gradients or weights) are shared, never the original data.
Secure aggregation. A coordinator combines updates from all participants to produce an improved model.
Iteration. The cycle repeats until the model converges.
The result is a model trained with the data richness of all participating organizations, without any having revealed a single record to the others.
Federated learning in our data space
Our data space is incorporating federated learning capabilities as a natural extension of its architecture. The federated infrastructure already connecting participants (connectors, decentralized identity, verifiable agreements) is the ideal foundation for coordinating distributed training.
The approach envisions each data space connector acting as a training node. Local datasets are used for training without leaving the connector, and only model parameters are exchanged through existing secure channels, protected by mutual TLS and decentralized identity verification.
Privacy by design, not as a patch
Unlike solutions that add privacy as an afterthought, our data space integrates protection from its foundational architecture. Data never leaves the participant's infrastructure (this principle applies to both data sharing and federated learning). Communication between connectors is protected by multiple authentication layers. And training agreements are recorded on blockchain for complete traceability.
Furthermore, secure aggregation protocols are designed to prevent model updates from leaking information about each participant's individual data.
Use cases: where federated learning makes the difference
Forestry and environment. Multiple regional administrations can jointly train fire detection or pest prediction models using data from their own monitoring stations, without centralizing sensitive environmental information.
Manufacturing. Factories sharing the same type of machinery can collaborate on predictive maintenance models, improving accuracy without revealing proprietary operational data.
Smart cities. Municipalities can co-train traffic management or energy consumption models without sharing citizens' personal data.
Agri-food chain. Producers, distributors, and retailers can develop demand forecasting models that benefit the entire chain while maintaining each actor's commercial confidentiality.
The future: collaborative AI at European scale
Federated learning integrated into data spaces is not just a technical improvement: it is a paradigm shift in how Europe can compete in artificial intelligence while respecting its own privacy and sovereignty standards. Instead of concentrating data on a few platforms, federated learning allows thousands of organizations to contribute to powerful models without ceding control of their most valuable asset.
Our data space is building the infrastructure to make this a reality, not an academic concept.