Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Crowdsourced semantic annotation of scientific publications and tabular data in PDF

: Takis, Jaana; Saiful Islam, A.Q.M.; Lange, Christoph; Auer, Sören

Preprint urn:nbn:de:0011-n-3563663 (1.0 MByte PDF)
MD5 Fingerprint: 52907b13d727a9a941e85a80c5bbe7df
© 2015 Copyright held by the owner/author(s). Publication rights licensed to ACM
Erstellt am: 26.8.2015

Association for Computing Machinery -ACM-:
11th International Conference on Semantic Systems, SEMANTiCS 2015. Proceedings : 16-17-September-2015, Vienna, Austria
New York: ACM, 2015
ISBN: 978-1-4503-3462-4
8 S.
International Conference on Semantic Systems (SEMANTICS) <11, 2015, Vienna>
European Commission EC
H2020; 645833;
Financial Transparency Platform for the Public Sector
Konferenzbeitrag, Elektronische Publikation
Fraunhofer IAIS ()
human factor; document management; web-based services; computer supported collaborative work

Significant amounts of knowledge in science and technology have so far not been published as Linked Open Data but are contained in the text and tables of legacy PDF publications. Making such information available as RDF would, for example, provide direct access to claims and facilitate surveys of related work. A lot of valuable tabular information that till now only existed in PDF documents would also finally become machine understandable. Instead of studying scientific literature or engineering patents for months, it would be possible to collect such input by simple SPARQL queries. The SemAnn approach enables collaborative annotation of text and tables in PDF documents, a format that is still the common denominator of publishing, thus maximising the potential user base. The resulting annotations in RDF format are available for querying through a SPARQL endpoint. To incentivise users with an immediate benefit for making the effort of annotation, SemAnn recommends related papers, taking into account the hierarchical context of annotations in a novel way. We evaluated the usability of SemAnn and the usefulness of its recommendations by analysing annotations resulting from tasks assigned to test users and by interviewing them. While the evaluation shows that even few annotations lead to a good recall, we also observed unexpected, serendipitous recommendations, which confirms the merit of our low-threshold annotation support for the crowd.