Connecting the unconnectable: how to integrate databases, clouds, IoT sensors, and APIs into a single data space

AI Open Space

Connecting the unconnectable: how to integrate databases, clouds, IoT sensors, and APIs into a single data space

One of the biggest obstacles to sharing data between organizations is not willingness, but technological heterogeneity. Company A stores its data in PostgreSQL, company B in MongoDB, company C in an AWS S3 bucket, and company D receives real-time information from MQTT sensors. Our data space addresses this challenge with a connector system that integrates over 20 types of data sources and sinks under a unified interface.

The challenge of technological fragmentation

In Europe, the business landscape is extraordinarily diverse. An SME in the agri-food sector may manage its data in spreadsheets and an FTP server, while an energy multinational operates with data lakes on Hadoop and Kafka-based streaming platforms. For a data space to work in practice, it needs to speak all these technological languages.

Our data space solves this through a Data Transfer Module that implements the concept of connectors: specialized components that encapsulate the communication logic for each technology type and expose a standard interface to the rest of the system.

Connector catalog: four major families

Connectors are organized into four categories based on the data flow type they support:

  • Structured data. For relational and NoSQL databases: MongoDB, MySQL, PostgreSQL, Cassandra, Neo4j, Elasticsearch, Qdrant, and TimescaleDB. They support queries with filters, projection, and ordering, automatically translating operations into each engine's native language.

  • Files. For cloud storage and file systems: AWS S3, Azure Blob Storage, Google Cloud Storage, MinIO, Google Drive, Hadoop HDFS, IPFS, and FTP/SFTP servers. They provide file listing, metadata reading, and byte-stream transfer.

  • Services (APIs). For REST API integration. The API connector supports all HTTP methods, authentication (Basic, Bearer, custom), query parameters, custom headers, and multiple response formats (JSON, CSV, raw).

  • Real-time. For streaming data: Apache Kafka, MQTT, CoAP, and WebSocket. These connectors implement topic or resource subscriptions and deliver messages continuously to the data pipeline.

Auto-discovery and intelligent caching

The Data Transfer Module uses an auto-loading mechanism: at startup, it scans the connectors directory, validates each factory, and registers available connectors automatically. This means adding support for a new technology is as simple as creating a file that implements the standard interface.

Additionally, the system maintains an LRU cache of active connections. When a connector with the same configuration is requested, the existing instance is reused instead of creating a new one. When an entry expires, the cleanup method is automatically invoked to release resources.

Visual configuration: automatically generated forms

Each connector defines a declarative configuration schema that the system uses to automatically generate forms in the user interface. The schema supports text fields, numbers, booleans, lists, and nested objects, plus conditional fields that show or hide based on user selection.

This dramatically reduces the entry barrier: an administrator without deep knowledge of Cassandra or Kafka can configure a connection by filling in a guided form, without touching configuration files or command lines.

Decentralized storage: IPFS as a data source

A differentiating aspect is the integration with IPFS (InterPlanetary File System), the decentralized storage system. This allows participants to share datasets stored on decentralized networks, aligning the data sovereignty philosophy with distributed storage technologies that don't depend on a single cloud provider.

IoT and real-time data: from sensor to data space

The MQTT and CoAP connectors are specifically designed for the IoT ecosystem. A sensor publishing readings every second on an MQTT broker can directly feed a data space pipeline that processes, anonymizes, and shares that data with other participants in real time.

This capability is especially relevant for sectors like precision agriculture, environmental monitoring, or Industry 4.0, where data rapidly loses value if not processed and shared instantly.

A connector for every organization

The diversity of connectors is not a technical whim: it is the response to a European reality where no single technology dominates. By supporting over 20 types of sources and sinks, our data space eliminates the need for organizations to migrate their systems to participate in the data economy. Each participant connects what they already have, and the data space ensures everyone speaks the same language.