From metadata catalogs to distributed data processing for smart city platforms and services: A study on the interplay of CKAN and hadoop
Smart Cities are emerging based on the idea of provisioning and processing large amounts of urban data for various use cases. Thereby, Urban Data Platforms are usually employed to accumulate and expose the large amounts of governmental (i.e. public sector), sensor, static and real-time data in order to enable the community to create valuable applications and services for future Smart Cities. Hitherto, the Open Data initiative was seen as the key driver to providing large amounts of data within a city. Open Data platforms employ so-called data registries in order to keep track of the available datasets at various sources spread throughout the city, with CKAN currently being among the most popular data catalog software worldwide. With the emergence of frameworks for large scale distributed computing and storage, such as Hadoop and the belonging distributed file systems (HDFS), there is an inherent need for bridging the worlds of metadata catalogs and distributed data processing towards the goal of providing sophisticated urban ICT services. The current paper constitutes a first attempt on this new field, by prototyping and evaluating components that enable the collaboration and interplay between CKAN and Hadoop/HDFS. This interplay is realized through extensions to CKAN and its harvesting process and its benefits are demonstrated by belonging case studies.