Options
2025
Conference Paper
Title
Automated Information Extraction from Unstructured Documents Using Large Language Models and different Questioning Strategies
Abstract
In the era of big data, organizations face significant challenges in extracting valuable information from unstructured documents. This paper explores the application of locally hosted large language models (LLMs) to automate information extraction processes, focusing on their effectiveness and reliability. We investigate various querying strategies, including simple, repetitive, and two-stage approaches, to assess their impact on extraction accuracy. A demonstrator was developed to test these strategies using documents in cooperation with the Volkswagen Group, with results evaluated through the BERTScore metric. Our findings reveal that while certain strategies yield promising results, their performance inconsistency suggests they should be applied with caution in accuracy-critical contexts. We discuss the implications of our results for knowledge management and outline avenues for future research, including the exploration of more advanced models and tailored querying techniques to enhance information retrieval efficacy.
Author(s)