• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Q&A Eval: Benchmarking Secure Coding Ability of LLMs on Real-World Tasks
 
  • Details
  • Full
Options
April 12, 2026
Conference Paper
Title

Q&A Eval: Benchmarking Secure Coding Ability of LLMs on Real-World Tasks

Abstract
Conversational Models revolutionize the way we think, communicate, and code. Large Language Models (LLMs) such as GPT-o4 can generate thousands of lines of code in seconds, ranging from simple boilerplate functions to large and complex applications. In this study, we evaluate the security and quality of the code produced by LLMs, comparing it to a human baseline derived from a vast corpus of StackOverflow questions and answers. We queried 5 LLMs with over 10,000 cybersecurity-related questions from StackOverflow. Using three static code scanners, we automatically identified software vulnerabilities in the AI-generated code for Java and Python as well as the human-provided code snippets in the StackOverflow answers. Based on this data, we analyze what developers can expect from LLM-generated code and how its security level compares to that of code provided by humans. We find that popular LLMs generate code that is less secure than code written by human programmers. LLMs often replicate common vulnerability patterns and, in some cases, introduce additional security issues. Our results contradict a previous study on a similar, albeit smaller dataset.
Author(s)
Toran, Markus  orcid-logo
Fraunhofer-Institut für Sichere Informationstechnologie SIT  
Ballin, Bettina
Miltenberger, Marc  
Fraunhofer-Institut für Sichere Informationstechnologie SIT  
Arzt, Steven  
Fraunhofer-Institut für Sichere Informationstechnologie SIT  
Mainwork
IEEE/ACM 4th International Workshop on Software Vulnerability Management SVM 2026  
Conference
International Workshop on Software Vulnerability Management 2026  
Open Access
File(s)
Download (668.35 KB)
Rights
CC BY 4.0: Creative Commons Attribution
DOI
10.1145/3786165.3788437
10.24406/publica-9194
Language
English
Fraunhofer-Institut für Sichere Informationstechnologie SIT  
Keyword(s)
  • Code Generation

  • Code Smells

  • Machine Learning

  • StackOverflow

  • Vulnerabilities

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024