Pushing the scalability of RDF engines on IoT edge devices
Semantic interoperability for the Internet of Things (IoT) is enabled by standards and technologies from the Semantic Web. As recent research suggests a move towards decentralised IoT architectures, we have investigated the scalability and robustness of RDF (Resource Description Framework)engines that can be embedded throughout the architecture, in particular at edge nodes. RDF processing at the edge facilitates the deployment of semantic integration gateways closer to low-level devices. Our focus is on how to enable scalable and robust RDF engines that can operate on lightweight devices. In this paper, we have first carried out an empirical study of the scalability and behaviour of solutions for RDF data management on standard computing hardware that have been ported to run on lightweight devices at the network edge. The findings of our study shows that these RDF store solutions have several shortcomings on commodity ARM (Advanced RISC Machine) boards that are representative of IoT edge node hardware. Consequently, this has inspired us to introduce a lightweight RDF engine, which comprises an RDF storage and a SPARQL processor for lightweight edge devices, called RDF4Led. RDF4Led follows the RISC-style (Reduce Instruction Set Computer) design philosophy. The design constitutes a flash-aware storage structure, an indexing scheme, an alternative buffer management technique and a low-memory-footprint join algorithm that demonstrates improved scalability and robustness over competing solutions. With a significantly smaller memory footprint, we show that RDF4Led can handle 2 to 5 times more data than popular RDF engines such as Jena TDB (Tuple Database) and RDF4J, while consuming the same amount of memory. In particular, RDF4Led requires 10%-30% memory of its competitors to operate on datasets of up to 50 million triples. On memory-constrained ARM boards, it can perform faster updates and can scale better than Jena TDB and Virtuoso. Furthermore, we demonstrate considerably faster query operations than Jena TDB and RDF4J.
Le Tuan, Anh
Open Distributed Systems, Technical University of Berlin, Berlin / Insight Centre for Data Analytics, National University of Ireland Galway, Galway, Ireland