• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. ENCOPLOT: Pairwise sequence matching in linear time applied to plagiarism detection
 
  • Details
  • Full
Options
2009
Conference Paper
Title

ENCOPLOT: Pairwise sequence matching in linear time applied to plagiarism detection

Abstract
In this paper we describe a new general plagiarism detection method, that we used in our winning entry to the 1st International Competition on Plagiarism Detection, the external plagiarism detection task, which assumes the source documents are available. In the first phase of our method, a matrix of kernel values is computed, which gives a similarity value based on n-grams between each source and each suspicious document. In the second phase, each promising pair is further investigated, in order to extract the precise positions and lengths of the subtexts that have been copied and maybe obfuscated using encoplot, a novel linear time pairwise sequence matching technique. We solved the significant computational challenges arising from having to compare millions of document pairs by using a library developed by our group mainly for use in network security tools. The performance achieved is comparing more than 49 million pairs of documents in 12 hours on a single computer. The results in the challenge were very good, we outperformed all other methods.
Author(s)
Grozea, C.
Fraunhofer FIRST
Gehl, C.
Fraunhofer FIRST
Popescu, M.
Mainwork
3rd PAN Workshop "Uncovering Plagiarism, Authorship and Social Software Misuse" 2009  
Conference
Workshop "Uncovering Plagiarism, Authorship and Social Software Misuse 2009  
Spanish Society for Natural Language Processing (SEPLN Conference) 2009  
Language
English
FIRST
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024