DivQ: Diversification for keyword search over structured databases
Keyword queries over structured databases are notoriously ambiguous. No single interpretation of a keyword query can satisfy all users, and multiple interpretations may yield overlapping results. This paper proposes a scheme to balance the relevance and novelty of keyword search results over structured databases. Firstly, we present a probabilistic model which effectively ranks the possible interpretations of a keyword query over structured data. Then, we introduce a scheme to diversify the search results by re-ranking query interpretations, taking into account redundancy of query results. Finally, we propose -nDCG-W and WS-recall, an adaptation of -nDCG and S-recall metrics, taking into account graded relevance of subtopics. Our evaluation on two real-world datasets demonstrates that search results obtained using the proposed diversification algorithms better characterize possible answers available in the database than the results of the initial relevance ranking.