Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Statisfy me: What are my stats?

 
: Sejdiu, G.; Ermilov, I.; Mami, M.N.; Lehmann, J.

:
Fulltext ()

Erp, M. van:
ISWC-P&D-Industry-BlueSky 2018. ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks. Online resource : Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018), Monterey, USA, October 8th to 12th, 2018
Monterey: CEUR, 2018 (CEUR Workshop Proceedings 2180)
http://ceur-ws.org/Vol-2180/
ISSN: 1613-0073
4 pp.
International Semantic Web Conference (ISWC) <17, 2018, Monterey/Calif.>
English
Conference Paper, Electronic Publication
Fraunhofer IAIS ()

Abstract
The increasing adoption of the Linked Data format, RDF, over the last two decades has brought new opportunities. It has also raised new challenges
though, especially when it comes to managing and processing large amounts of
RDF data. In particular, assessing the internal structure of a data set is important, since it enables users to understand the data better. One prominent way of assessment is computing statistics about the instances and schema of a data set. However, computing statistics of large RDF data is computationally expensive. To overcome this challenging situation, we previously built DistLODStats, a framework for parallel calculation of 32 statistical criteria over large RDF datasets, based on Apache Spark. Running DistLODStats is, thus, done via submitting jobs to a Spark cluster. Often times, this process is done manually, either by connecting to the cluster machine or via a dedicated resource manager. This approach is inconvenient as it requires acquiring new software skills as well as the direct interaction of users with the cluster. In order to make the use of DistLODStats easier, we propose in this paper an approach for triggering RDF statistics remotely simply using HTTP requests. DistLODStats is built as a plugin into the larger SANSA Framework and makes use of Apache Livy, a novel lightweight solution for interacting with Spark cluster via a REST Interface.

: http://publica.fraunhofer.de/documents/N-581821.html