• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Anderes
  4. Reinforcement Learning for Large Language Model Fine-Tuning: A Systematic Literature Review
 
  • Details
  • Full
Options
November 21, 2025
Paper (Preprint, Research Paper, Review Paper, White Paper, etc.)
Title

Reinforcement Learning for Large Language Model Fine-Tuning: A Systematic Literature Review

Abstract
Large Language Models (LLMs) have been developed for a wide range of language-based tasks, while Reinforcement Learning (RL) has been primarily applied to decision-making problems such as robotics, game theory, and control systems. Nowadays, these two paradigms are integrated through different synergies. In this literature review, we focus RL4LLM fine-tuning, where RL techniques are systematically leveraged to fine-tune LLMs and align them with various preferences. Our review provides a comprehensive analysis of 230 recent publications, presenting a methodological taxonomy that organizes current research into three primary method domains: Optimization Algorithm, concerning innovation in core RL update rules; Training Framework, regarding innovation in the orchestration of the training process; and Reward Modeling, addressing how LLMs learn and represent preferences and feedback. Within these primary domains, we further analyze methods and innovations through more granular categories to provide in-depth summary of RL4LLM fine-tuning research. We address three research questions: 1) recent methods overview, 2) methodological innovations, and 3) limitations and future work. Our analysis comprehensively demonstrates the breadth and impact of recent RL4LLM fine-tuning research while highlighting valuable directions for future investigation.
Author(s)
Kong, Lingxiao
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Ramadan, Qusai
University of Southern Denmark -SDU-  
Zoubia, Oussama
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Polash, Jahid Hasan
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Elwes, Mayra
University of Cologne
Akbari Gurabi, Mehdi  orcid-logo
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Jin, Lu
University of Siegen
Kutafina, Ekaterina
University of Cologne
Matzutt, Roman  orcid-logo
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Wang, Yuanbin
University of Cologne
Xu, Junqi
Soochow University
Beyan, Oya Deniz
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Yang, Cong
Soochow University
Boukhers, Zeyd  
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Project(s)
NFDI für Datenwissenschaften und Künstliche Intelligenz  
Funder
Deutsche Forschungsgemeinschaft -DFG-, Bonn
Open Access
File(s)
Download (1.37 MB)
Rights
CC BY-SA 4.0: Creative Commons Attribution-ShareAlike
DOI
10.13140/RG.2.2.22917.41442/1
10.24406/publica-6550
Language
English
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Keyword(s)
  • Reinforcement Learning

  • Large Language Models

  • Fine-tuning Techniques

  • Training Framework

  • Reward Modeling

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024