• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Resource-Efficient Anonymization of Textual Data via Knowledge Distillation from Large Language Models
 
  • Details
  • Full
Options
January 2025
Conference Paper
Title

Resource-Efficient Anonymization of Textual Data via Knowledge Distillation from Large Language Models

Abstract
Protecting personal and sensitive information in textual data is increasingly crucial, especially when leveraging large language models (LLMs) that may pose privacy risks due to their API-based access. We introduce a novel approach and pipeline for anonymizing text across arbitrary domains without the need for manually labeled data or extensive computational resources. Our method employs knowledge distillation from LLMs into smaller encoder-only models via named entity recognition (NER) coupled with regular expressions to create a lightweight model capable of effective anonymization while preserving the semantic and contextual integrity of the data. This reduces computational overhead, enabling deployment on less powerful servers or even personal computing devices. Our findings suggest that knowledge distillation offers a scalable, resource-efficient pathway for anonymization, balancing privacy preservation with model performance and computational efficiency.
Author(s)
Deußer, Tobias  orcid-logo
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Hahnbück, Max
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Uelwer, Tobias
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Zhao, Cong
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Bauckhage, Christian  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Sifa, Rafet  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Mainwork
COLING 2025, 31st International Conference on Computational Linguistics. Proceedings of the Industry Track  
Project(s)
The Lamarr Institute for Machine Learning and Artificial Intelligence  
Funder
Bundesministerium für Bildung und Forschung -BMBF-  
Conference
International Conference on Computational Linguistics 2025  
Open Access
DOI
10.24406/publica-4308
File(s)
publica_Resource_Efficient_Anonymization_of_Textual_Data_via_Knowledge_Distillation_from_Large_Language_Models.pdf (273.94 KB)
Rights
CC BY 4.0: Creative Commons Attribution
Language
English
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Keyword(s)
  • Anonymization

  • Named Entity Recognition

  • Large Language Models

  • Natural Language Processing

  • Machine Learning

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024