• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Anderes
  4. Authorship Verification based on the Likelihood Ratio of Grammar Models
 
  • Details
  • Full
Options
2024
Paper (Preprint, Research Paper, Review Paper, White Paper, etc.)
Title

Authorship Verification based on the Likelihood Ratio of Grammar Models

Title Supplement
Published on arXiv
Abstract
Authorship Verification (AV) is the process of analyzing a set of documents to determine whether they were written by a specific author. This problem often arises in forensic scenarios, e.g., in cases where the documents in question constitute evidence for a crime. Existing state-of-the-art AV methods use computational solutions that are not supported by a plausible scientific explanation for their functioning and that are often difficult for analysts to interpret. To address this, we propose a method relying on calculating a quantity we call λG (LambdaG): the ratio between the likelihood of a document given a model of the Grammar for the candidate author and the likelihood of the same document given a model of the Grammar for a reference population. These Grammar Models are estimated using n-gram language models that are trained solely on grammatical features. Despite not needing large amounts of data for training, LambdaG still outperforms other established AV methods with higher computational complexity, including a fine-tuned Siamese Transformer network. Our empirical evaluation based on four baseline methods applied to twelve datasets shows that LambdaG leads to better results in terms of both accuracy and AUC in eleven cases and in all twelve cases if considering only topic-agnostic methods. The algorithm is also highly robust to important variations in the genre of the reference population in many cross-genre comparisons. In addition to these properties, we demonstrate how LambdaG is easier to interpret than the current state-of-the-art. We argue that the advantage of LambdaG over other methods is due to fact that it is compatible with Cognitive Linguistic theories of language processing.
Author(s)
Nini, Andrea
sl-0
Halvani, Oren  
Fraunhofer-Institut für Sichere Informationstechnologie SIT  
Graner, Lukas
Fraunhofer-Institut für Sichere Informationstechnologie SIT  
Gherardi, Valerio
sl-0
Ishihara, Shunichi
sl-0
DOI
10.48550/arXiv.2403.08462
Language
English
Fraunhofer-Institut für Sichere Informationstechnologie SIT  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024