Options
2026
Conference Paper
Title
Enhancing Data Discoverability: A Semantic Item-Based Recommendation System for Open Data Catalogues
Abstract
Open Data Catalogue (ODCs) are repositories with open access to public datasets published by public organizations and institutions. Due to the vast amount of data available in these catalogues, users struggle to navigate and find relevant datasets even with an extensive keyword and facets search. A Recommender System (RecSys) increases data discoverability in these catalogues by providing relevant dataset suggestions to a user. In this paper, we design and present a modular and scalable RecSys that is tailored for ODC applications. Our design focuses on finding similar datasets by analysing the semantic similarity between their textual metadata properties. We utilize a multilingual Sentence Transformer model to calculate the similarity score and perform an Approximate Nearest Neighbours (ANN) search on the resulting vector embeddings to find datasets that are closer in meaning. Our solution is designed for use in one of the largest Open Data Portal (ODPs) in Europe, and we also present the evaluation of our RecSys on 1.8 million datasets from this portal.
Author(s)