• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Can Large Language Models (LLMs) Compete with Human Requirements Reviewers? - Replication of an Inspection Experiment on Requirements Documents
 
  • Details
  • Full
Options
2025
Conference Paper
Title

Can Large Language Models (LLMs) Compete with Human Requirements Reviewers? - Replication of an Inspection Experiment on Requirements Documents

Abstract
The use of large language models (LLMs) for software engineering is growing, especially for code - typically to generate code or to detect or fix quality problems. Because requirements are often written in natural language, it seems promising to exploit the capabilities of LLMs to detect requirement problems. We replicated an inspection experiment in which computer science students searched for defects in requirements documents using different reading techniques. In our replication, we used the LLM GPT-4-Turbo instead of students to determine how the model compares to human reviewers. Additionally, we considered GPT-3.5-Turbo, Nous-Hermes-2-Mixtral-8x7B-DPO, and Phi-3-medium-128k-instruct for one research question. We focus on single prompt approaches and avoid more complex approaches to mimic the original study design where students received all the material at once. We had two phases. First, we explored the general feasibility of using LLMs for requirements inspection on a practice document and examined different prompts. Second, we applied selected approaches to two requirements documents and compared the approaches to each other and to human reviewers. The approaches include variations in reading techniques (ad-hoc, perspective-based, checklist-based), LLMs, the instructions, and material provided. We found that LLMs (a) report only a limited number of deficits despite having enough tokens, which (b) do not vary much across prompts. They (c) rarely match the sample solution.
Author(s)
Seifert, Daniel
Fraunhofer-Institut für Experimentelles Software Engineering IESE  
Jöckel, Lisa  
Fraunhofer-Institut für Experimentelles Software Engineering IESE  
Trendowicz, Adam  
Fraunhofer-Institut für Experimentelles Software Engineering IESE  
Ciolkowski, Marcus
Honroth, Thorsten
Fraunhofer-Institut für Experimentelles Software Engineering IESE  
Jedlitschka, Andreas  
Fraunhofer-Institut für Experimentelles Software Engineering IESE  
Mainwork
Product-Focused Software Process Improvement. 25th International Conference, PROFES 2024. Proceedings  
Conference
International Conference on Product-Focused Software Process Improvement 2024  
DOI
10.1007/978-3-031-78386-9_3
Language
English
Fraunhofer-Institut für Experimentelles Software Engineering IESE  
Keyword(s)
  • Artificial Intelligence

  • Machine Learning

  • Quality Assurance

  • Requirements Engineering

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024