The federated catalog: how to discover, publish, and monetize data in a European data space
If data is the new oil, the catalog is the refinery that makes it usable. In a data space, the catalog serves a critical function: it's the entry point where consumers discover what data exists, who offers it, and under what conditions. Without a well-designed catalog, a data space with hundreds of datasets becomes a maze. Our data space implements a federated catalog system with three organizational layers: providers, brokers, and consumers.
Three roles, one ecosystem
In our data space's catalog model, each participant adopts one or several roles simultaneously:
Data provider. Creates local catalogs, adds datasets with their metadata and usage policies, and exposes them to the rest of the data space. They own the data and define the rules.
Metadata broker. Aggregates catalogs from multiple providers into a single point. Consumers query the broker instead of visiting each provider individually. When a new provider joins the data space, they simply register with the broker for their data to become discoverable by everyone.
Consumer. Searches for datasets in catalogs (local or through brokers), explores their metadata and policies, and initiates negotiations to access those of interest.
This role separation eliminates the need for each consumer to know every provider's address. The broker acts as a hub that keeps the catalog updated through periodic synchronization.
Rich metadata: more than a title and description
Each dataset in the catalog is described with a complete set of metadata: title, multilingual descriptions, keywords, themes, creator, conformance standard, external identifier, publication and last modification dates.
Multilingual descriptions are especially relevant in a European context where participants operate in different languages. A dataset published with descriptions in Spanish, English, and French can be discovered by consumers from any of those markets.
Catalogs also support advanced filtering by keywords, themes, date ranges, and semantic search by title and description, facilitating discovery even in catalogs with thousands of datasets.
Usage policies: the rules of the game for each dataset
Each dataset can have multiple usage policies, each expressed in ODRL. This allows offering the same data under different conditions: a free policy for academic use with redistribution restrictions, a paid policy for commercial use, and a premium subscription policy with unlimited access.
The consumer browses available policies before initiating a negotiation, choosing the one that best fits their use case. If no policy fits exactly, they can send a counter-proposal during negotiation.
Broker registration: from invisible to discoverable
For a provider to be visible in the data space, they need to register with at least one metadata broker. Our data space offers two registration paths: an automatic process through the connector protocol (where the provider sends a request with their contact email, website, and synchronization interval), or a manual process managed by the broker administrator.
Once registration is approved, the broker periodically synchronizes the provider's catalog, automatically propagating any changes (new datasets, updated metadata, modified policies) to all participants.
IDS catalog protocol: interoperable by design
The catalog is exposed to other connectors via the IDS Catalog HTTPS binding. Responses use JSON-LD to represent catalogs and datasets in standard format, with continuation token-based pagination for handling large data volumes.
To access a provider's catalog, a consumer must present a valid identity and at least one verifiable credential signed by a trusted issuer. This ensures that even the simple act of discovering what data exists is subject to the data space's trust rules.
The catalog as the engine of the data economy
A well-implemented federated catalog is not just a file list: it's the marketplace that enables the data economy. It allows small organizations to make their data visible alongside large enterprises, lets consumers compare offers and conditions, and ensures trust is established before a single piece of data changes hands.
In our data space, the catalog closes the loop between publication, discovery, negotiation, and transfer: the complete flow that turns isolated data into shared value.