Options
2024
Conference Paper
Title
Enhancing Error Report Analysis in Industry: Evaluating Large Language Models and Traditional NLP for Text Clustering
Abstract
This research explores traditional Natural Language Processing (NLP) and Large Language Models (LLMs) for clustering industrial error reports, a task traditionally challenged by the complexity of free-text data. Clustering the error reports paves the way for further data analysis. The study follows a dual-path approach: one employing traditional NLP techniques like BERT and Word2vec for data encoding and clustering algorithms such as K-Means and affinity propagation, and another using LLMs such as GPT with different prompting strategies. A key aspect of the methodology is the evaluation of developed clustering pipelines against multiple human annotators for benchmarking. The analysis, using metrics like the Adjusted Rand Index and Adjusted Mutual Information, reveals that both NLP and LLM methodologies achieve high accuracy, closely
matching human clustering efforts. Notably LLMs, especially with 'chain of thought'-like prompts, can support human annotators in defining meaningful clusters and assigning semantic meaning.
matching human clustering efforts. Notably LLMs, especially with 'chain of thought'-like prompts, can support human annotators in defining meaningful clusters and assigning semantic meaning.