• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. LLMs and Memorization: On Quality and Specificity of Copyright Compliance
 
  • Details
  • Full
Options
October 16, 2024
Conference Paper
Title

LLMs and Memorization: On Quality and Specificity of Copyright Compliance

Abstract
Memorization in large language models (LLMs) is a growing concern. LLMs have been shown to easily reproduce parts of their training data, including copyrighted work. This is an important problem to solve, as it may violate existing copyright laws as well as the European AI Act. In this work, we propose a systematic analysis to quantify the extent of potential copyright infringements in LLMs using European law as an example. Unlike previous work, we evaluate instruction-finetuned models in a realistic end-user scenario. Our analysis builds on a proposed threshold of 160 characters, which we borrow from the German Copyright Service Provider Act and a fuzzy text matching algorithm to identify potentially copyright-infringing textual reproductions. The specificity of countermeasures against copyright infringement is analyzed by comparing model behavior on copyrighted and public domain data. We investigate what behaviors models show instead of producing protected text (such as refusal or hallucination) and provide a first legal assessment of these behaviors. We find that there are huge differences in copyright compliance, specificity, and appropriate refusal among popular LLMs. Alpaca, GPT 4, GPT 3.5, and Luminous perform best in our comparison, with OpenGPT-X, Alpaca, and Luminous producing a particularly low absolute number of potential copyright violations. Code can be found at github.com/felixbmuller/llms-memorization-copyright.
Author(s)
Müller, Felix Benjamin
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Görge, Rebekka
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Bernzen, Anna K.
Universität Regensburg  
Pirk, Janna C.
University of Regensburg
Poretschkin, Maximilian  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Mainwork
Seventh AAAI/ACM Conference on AI, Ethics, and Society, AIES 2024. Proceedings  
Project(s)
ZERTIFIZIERTE KI
The Lamarr Institute for Machine Learning and Artificial Intelligence  
Funder
Ministerium für Wirtschaft, Industrie, Klimaschutz und Energie des Landes Nordrhein-Westfalen MWIDE
Bundesministerium für Bildung und Forschung -BMBF-  
Conference
Conference on AI, Ethics, and Society 2024  
DOI
10.1609/aies.v7i1.31697
Language
English
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024