• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. Leveraging Synthetically Generated Data for Real Estate Document Classification
 
  • Details
  • Full
Options
2025
Conference Paper
Title

Leveraging Synthetically Generated Data for Real Estate Document Classification

Abstract
Document classification in regulated domains like law, finance, or real estate is hindered by the scarcity of labeled data and strict privacy constraints. This paper presents a pipeline for synthetically generating training data for document classifiers using a combination of domain-specific templates, large language models, and data augmentation techniques. Focusing on two key document types relevant to real estate workflows, Child Support Certificate and Refurbishment Roadmap, we construct realistic multi-page documents and generate negative classes using LLMgenerated distractors. We train a BERT-based classifier on this synthetic dataset and evaluate it on real-world OCR-extracted documents, achieving strong performance despite the absence of real documents in training. Our findings highlight the feasibility of using synthetic data to overcome annotation bottlenecks and pave the way for broader applications in privacy-sensitive industries.
Author(s)
Deußer, Tobias  orcid-logo
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Ramien, Gregor
Atruvia AG
Weber, Nico
Atruvia AG
Meidinger, Maximilian
Atruvia AG
Hahnbück, Max
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Bauckhage, Christian  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Sifa, Rafet  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Mainwork
IEEE International Conference on Big Data, BigData 2025  
Conference
International Conference on Big Data 2025  
DOI
10.1109/BigData66926.2025.11400789
Language
English
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Keyword(s)
  • document classification

  • finance

  • large language models

  • machine learning

  • natural language processing

  • synthetic data

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024