• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning
 
  • Details
  • Full
Options
August 2025
Conference Paper
Title

EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning

Abstract
Recent advances in reinforcement learning (RL) for large language model (LLM) finetuning show promise in addressing multiobjective tasks but still face significant challenges, including competing objective balancing, low training efficiency, poor scalability, and limited explainability. Leveraging ensemble learning principles, we introduce an Ensemble Multi-Objective RL (EMORL) framework that fine-tunes multiple models with individual objectives while optimizing their aggregation after the fine-tuning to improve efficiency and flexibility. Our method is the first to aggregate the hidden states of individual models, incorporating contextual information from multiple objectives. This approach is supported by a hierarchical grid search algorithm that identifies optimal weighted combinations. We evaluate EMORL on counselor reflection generation tasks, using text classification models to score the generations and provide rewards during RL fine-tuning. Through comprehensive experiments on the PAIR and Psych8k datasets, we demonstrate the advantages of EMORL against existing baselines: significantly lower and more stable training consumption (17, 529 ± 1, 650 data points and 6, 573 ± 147.43 seconds), improved scalability and explainability, and comparable performance across multiple objectives.
Author(s)
Kong, Lingxiao
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Yang, Cong
Neufang, Susanne
Beyan, Oya Deniz
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Boukhers, Zeyd  
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Mainwork
SIGDIAL 2025, 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Proceedings  
Project(s)
Non-destructive, scalable, smart monitoring of remote cultural treasures  
Funder
European Commission  
Conference
Special Interest Group on Discourse and Dialogue (SIGdial Annual Meeting) 2025  
Open Access
File(s)
Download (4.79 MB)
Rights
CC BY-SA 4.0: Creative Commons Attribution-ShareAlike
DOI
10.18653/v1/2025.sigdial-1.33
10.24406/publica-5721
Language
English
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Keyword(s)
  • Multi-Objective Optimization

  • Reinforcement Learning

  • Large Language Model

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024