CC BY-SA 4.0Kong, LingxiaoLingxiaoKongRamadan, QusaiQusaiRamadanZoubia, OussamaOussamaZoubiaPolash, Jahid HasanJahid HasanPolashElwes, MayraMayraElwesAkbari Gurabi, MehdiMehdiAkbari GurabiJin, LuLuJinKutafina, EkaterinaEkaterinaKutafinaMatzutt, RomanRomanMatzuttWang, YuanbinYuanbinWangXu, JunqiJunqiXuBeyan, Oya DenizOya DenizBeyanYang, CongCongYangBoukhers, ZeydZeydBoukhers2025-11-252025-11-252025-11-21https://publica.fraunhofer.de/handle/publica/499712https://doi.org/10.24406/publica-655010.13140/RG.2.2.22917.41442/110.24406/publica-6550Large Language Models (LLMs) have been developed for a wide range of language-based tasks, while Reinforcement Learning (RL) has been primarily applied to decision-making problems such as robotics, game theory, and control systems. Nowadays, these two paradigms are integrated through different synergies. In this literature review, we focus RL4LLM fine-tuning, where RL techniques are systematically leveraged to fine-tune LLMs and align them with various preferences. Our review provides a comprehensive analysis of 230 recent publications, presenting a methodological taxonomy that organizes current research into three primary method domains: Optimization Algorithm, concerning innovation in core RL update rules; Training Framework, regarding innovation in the orchestration of the training process; and Reward Modeling, addressing how LLMs learn and represent preferences and feedback. Within these primary domains, we further analyze methods and innovations through more granular categories to provide in-depth summary of RL4LLM fine-tuning research. We address three research questions: 1) recent methods overview, 2) methodological innovations, and 3) limitations and future work. Our analysis comprehensively demonstrates the breadth and impact of recent RL4LLM fine-tuning research while highlighting valuable directions for future investigation.enReinforcement LearningLarge Language ModelsFine-tuning TechniquesTraining FrameworkReward ModelingReinforcement Learning for Large Language Model Fine-Tuning: A Systematic Literature Reviewpaper