
Publica
Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten. Querying Data Lakes using Spark and Presto
| Association for Computing Machinery -ACM-: The Web Conference 2019. Proceedings of The World Wide Web Conference WWW 2019 : May 13-17, 2019, San Francisco, CA, USA New York: ACM, 2019 ISBN: 978-1-4503-6674-8 S.3574-3578 |
| World Wide Web Conference (WWW) <28, 2019, San Francisco/Calif.> |
| European Commission EC H2020; 776280; BETTER |
|
| Englisch |
| Konferenzbeitrag |
| Fraunhofer IAIS () |
Abstract
Squerall is a tool that allows the querying of heterogeneous, large-scale data sources by leveraging state-of-the-art Big Data processing engines: Spark and Presto. Queries are posed on-demand against a Data Lake, i.e., directly on the original data sources without requiring prior data transformation. We showcase Squerall's ability to query five different data sources, including inter alia the popular Cassandra and MongoDB. In particular, we demonstrate how it can jointly query heterogeneous data sources, and how interested developers can easily extend it to support additional data sources. Graphical user interfaces (GUIs) are offered to support users in (1) building intra-source queries, and (2) creating required input files.