Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Encoplot - tuned for high recall (also proposing a new plagiarism detection score)

 
: Grozea, Cristian; Popescu, Marius

:
Volltext (PDF; )

Forner, P.:
CLEF 2012 Conference and Labs of the Evaluation Forum. Evaluation Labs and Workshop. Online Working Notes : Information Access Evaluation meets Multilinguality, Multimodality, and Visual Analytics, Rome, Italy, September 17-20, 2012
Rome, 2012 (CEUR Workshop Proceedings 1178)
http://ceur-ws.org/Vol-1178/
ISBN: 978-88-904810-3-1
12 S.
International Conference of the Cross-Language Evaluation Forum (CLEF) <3, 2012, Rome>
Englisch
Konferenzbeitrag, Elektronische Publikation
Fraunhofer FOKUS ()
plagiarism detection; automatic plagiarism detection; plagiarism; natural language processing; NLP; encoplot

Abstract
This article describes the latest changes to our plagiarism detection system Encoplot. We have sent the modified system to the PAN@CLEF 2012 automatic detection of plagiarism challenge, where it ranked 2nd by the F-measure and 3rd by the "plagdet" scoring method that we had previously shown to be flawed to some extent. The main changes have been done to the heuristic that tries to recognize the clusters of N-grams matches as matching passages in the pair of documents examined. We have aimed for high recall under difficult conditions (sparse matches) which are typical for real-life rephrasing by people. The result of the evaluation on the training and test PAN 2012 corpora shows that we have achieved our goal of improving the performance of this piece of the Encoplot plagiarism dete ction system. In the final part of this article we analyze the anomalies of the plagdet scoring method, show that those are not negligible, and propose a modified plagdet version that lowers those anomalies.

: http://publica.fraunhofer.de/dokumente/N-256627.html