• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Multi-agent Retrieval-Augmented Generation for Enhancing Answer Generation and Knowledge Retrieval
 
  • Details
  • Full
Options
2026
Conference Paper
Title

Multi-agent Retrieval-Augmented Generation for Enhancing Answer Generation and Knowledge Retrieval

Abstract
Large language models (LLMs) have shown remarkable capabilities in natural language processing but often exhibit factual inconsistencies when applied to knowledge-intensive tasks, with hallucination rates as high as 30% in open-domain question answering. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by coupling language generation with evidence retrieval. However, conventional RAG systems frequently suffer from noisy document retrieval, limited context coverage, and decreased faithfulness in generated outputs. To address these limitations, this paper introduces a novel architecture, Multi-Agent Retrieval-Augmented Generation (MA-RAG), which decomposes the reasoning process into a set of specialized agents responsible for query reformulation, iterative retrieval refinement, hallucination detection, and answer validation. The modular design enables dynamic coordination and layered decision-making across the retrieval and generation pipeline. We evaluate MA-RAG on three widely used QA benchmarks: SQuAD v1.1, SQuAD v2.0, and HotpotQA, under both realistic large-scale retrieval conditions and idealized filtered settings. System performance is assessed using five retrieval and generation-focused metrics such as context precision, context recall, faithfulness, answer relevancy, and answer correctness, derived from the RAGAS framework. Additionally, we complement this evaluation with span-based metrics including Exact Match (EM), F1, and BLEU scores to capture surface-level overlap and fluency. MA-RAG consistently outperforms both Traditional RAG and Ensemble RAG across all datasets. Compared to Traditional RAG, it achieves up to 29.2% improvement in recall, 25.6% in precision, and 22.7% in correctness. Against Ensemble RAG, gains reach 25.9% in precision, 15.6% in recall, and 7.2% in correctness. On average, MA-RAG improves F1 by over 9% points and BLEU by more than 11%, while nearly doubling EM scores on SQuAD v1.1. These improvements highlight the robustness of the agentic framework under both noisy and clean retrieval environments. The empirical findings suggest MA-RAG thus provides a scalable and interpretable pathway toward building more trustworthy and accurate AI systems for question answering and other knowledge-centric NLP applications.
Author(s)
Kumar, Deepak  orcid-logo
Fraunhofer-Institut für System- und Innovationsforschung ISI  
Jain, Bhavesh Mahender  orcid-logo
Fraunhofer-Institut für System- und Innovationsforschung ISI  
Mainwork
Progress in Artificial Intelligence. 24th EPIA Conference on Artificial Intelligence, EPIA 2025. Proceedings. Part II  
Conference
Conference on Artificial Intelligence 2025  
DOI
10.1007/978-3-032-05179-0_22
Language
English
Fraunhofer-Institut für System- und Innovationsforschung ISI  
Keyword(s)
  • Retrieval-Augmented Generation

  • Multi-Agent Systems

  • large Language Models

  • Query Decomposition

  • Contextual Reasoning

  • Faithful Text Generation

  • Open-Domain Question Answering

  • Hallucination Mitigation

  • Retrieval Refinement

  • Multi-Hop Reasoning

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024