Large-scale storage and query processing for semantic sensor data
Nowadays, there is a rapid increase in the number of sensor data produced by a wide variety of devices and sensors. Collections of sensor data can be semantically described using ontologies, e.g., the Semantic Sensor Network (SSN) ontology. Albeit semantically enriched, the volume of semantic sensor data is considerably larger than raw sensor data. Moreover, some measurement values can be observed several times, and a large number of repeated facts can be generated. We devise a compact or factorized representation of semantic sensor data, where repeated values are represented only once. To scale up to large datasets, tabular representation is utilized to store and manage factorized semantic sensor data using Big data technologies. We empirically study the effectiveness of the proposed factorized representation of semantic sensor data, and the impact of factorizing semantic sensor data on query processing. Furthermore, we evaluate the effects of storing RDF factorized data on state-of-The-Art RDF engines and in the proposed tabular-based representation. Results suggest that factorization techniques empower storage and query processing of sensor data, and execution time can be reduced by up to two orders of magnitude.