• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Encoplot - tuned for high recall (also proposing a new plagiarism detection score)
 
  • Details
  • Full
Options
2012
Conference Paper
Title

Encoplot - tuned for high recall (also proposing a new plagiarism detection score)

Abstract
This article describes the latest changes to our plagiarism detection system Encoplot. We have sent the modified system to the PAN@CLEF 2012 automatic detection of plagiarism challenge, where it ranked 2nd by the F-measure and 3rd by the "plagdet" scoring method that we had previously shown to be flawed to some extent. The main changes have been done to the heuristic that tries to recognize the clusters of N-grams matches as matching passages in the pair of documents examined. We have aimed for high recall under difficult conditions (sparse matches) which are typical for real-life rephrasing by people. The result of the evaluation on the training and test PAN 2012 corpora shows that we have achieved our goal of improving the performance of this piece of the Encoplot plagiarism dete ction system. In the final part of this article we analyze the anomalies of the plagdet scoring method, show that those are not negligible, and propose a modified plagdet version that lowers those anomalies.
Author(s)
Grozea, Cristian  
Popescu, Marius
Mainwork
CLEF 2012 Conference and Labs of the Evaluation Forum. Evaluation Labs and Workshop. Online Working Notes  
Conference
International Conference of the Cross-Language Evaluation Forum (CLEF) 2012  
Link
Link
Language
English
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Keyword(s)
  • plagiarism detection

  • automatic plagiarism detection

  • plagiarism

  • natural language processing

  • NLP

  • encoplot

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024