Options
November 3, 2022
Conference Paper
Title
Getting and hosting your own copy of Wikidata
Abstract
Wikidata is a very large, crowd sourced, general knowledge graph that is backed by a worldwide community. Its original purpose was to link different versions of Wikipedia articles across multiple languages. Access to Wikidata is provided by the non-profit Wikimedia Foundation and recently also by Wikimedia Enterprise as a commercial service. The query access via the public Wikidata Query Service (WDQS) has limits that make larger queries with millions of results next to impossible, due to a one minute timeout restriction. Beyond addressing the timeout restriction, hosting a copy of Wikidata may be desirable in order to have a more reliable service, quicker response times, less user load, and better control over the infrastructure. It is not easy, but it is possible to get and host your own copy of Wikidata. The data and software needed to run a complete Wikidata instance are available as open source or accessible via free licenses. In this paper, we report on both successful and failed attempts to get and host your own copy of Wikidata, using different triple store servers. We share recommendations for the needed hardware and software, provide documented scripts to semi-automate the procedures, and document things to avoid.
Author(s)
Open Access
File(s)
Link
Rights
CC BY 4.0: Creative Commons Attribution
Language
English
Keyword(s)