Options
2013
Conference Paper
Title
Dataset retrieval
Abstract
Recently, a large number of dataset repositories, catalogs and portals are emerging in the science and government realms. Once a large number of datasets are published on such data portals, the question arises how to retrieve datasets satisfying an information need. In this paper, we present an approach for retrieving datasets according to user queries. We define dataset retrieval as a specialization of information retrieval. Instead of retrieving documents that are relevant to a certain information need, dataset retrieval describes the process of returning relevant RDF datasets. As with information retrieval, the term relevance cannot be clearly defined when using traditional methods like stemming. The inherent usage of RDF in these datasets enables a better way of retrieving relevant ones. We therefore propose an additional retrieval mechanism, which is inspired by facet search: dataset filtering. When querying, the entire set of available datasets is processed by a set of semantic filters each of which can unambiguously decide whether or not a given dataset is relevant to the query. The resulting set is then given back to the requester. We implemented and evaluated our approach in CKAN, which fuels publicdata.eu and is the most popular data portal worldwide.