Result: RDF-Connect : a declarative framework for streaming and cross-environment data processing pipelines

Title:
RDF-Connect : a declarative framework for streaming and cross-environment data processing pipelines
Source:
SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) co-located with 23th International Semantic Web Conference (ISWC 2024)
Publisher Information:
2024.
Publication Year:
2024
Document Type:
Conference Conference object
File Description:
application/pdf
Language:
English
Accession Number:
edsair.od.......330..1532dc6da0cf67067918a48d1b7e686f
Database:
OpenAIRE

Further Information

Data processing pipelines are a crucial component of any data-centric system today. Machine learning, data integration, and knowledge graph publishing are examples where data processing pipelines are needed. Furthermore, most production systems require data pipelines that support continuous operation and streaming-based capabilities for low-latency computations over large volumes of data. However, creation and maintenance of data processing pipelines is challenging and a lot of effort is usually spent on ad-hoc scripting, which limits reusability across systems. Existing solutions are not interoperable out-of-the-box and do not allow for easy integration of different execution environments (e.g., Java, Python, JavaScript, Rust, etc), while maintaining a streaming operation. For example, combining Python, JavaScript and Java-based libraries natively in a single pipeline is not straightforward. An interoperable and declarative mechanism could allow for continuous communication and integrated execution of data processing functions across different execution environments. We introduce RDF-Connect, a declarative framework based on semantic standards that enables instantiating pipelines with data processing functions across execution environments communicating through well-known communication protocols. We describe its architecture and demonstrate its use for an RDF knowledge graph creation, validation and publishing use case. The declarative nature of our approach facilitates reusability and maintainability of data processing pipelines. We currently support JavaScript and JVM-based environments but we aim to extend RDF-Connect support to other rich ecosystems such as Python and to lower-level languages such as Rust, to take advantage of system-level performance gains