Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Dynamic scheduling in modern processing systems using expert-guided distributed reinforcement learning

: Qu, Shuhui; Wang, Jie; Jasperneite, Jürgen


Institute of Electrical and Electronics Engineers -IEEE-; IEEE Industrial Electronics Society -IES-:
24th IEEE International Conference on Emerging Technologies and Factory Automation, ETFA 2019. Proceedings : Zaragoza, Spain, 10 - 13 September 2019
Piscataway, NJ: IEEE, 2019
ISBN: 978-1-7281-0303-7
ISBN: 978-1-7281-0302-0
ISBN: 978-1-7281-0304-4
International Conference on Emerging Technologies and Factory Automation (ETFA) <24, 2019, Zaragoza>
Conference Paper
Fraunhofer IOSB ()
queueing system; reinforcement learning; expert knowledge; intelligent automation

In the environment of modern processing systems, one topic of great interest is how to optimally schedule (i.e., allocate) jobs with different requirements for the systems to meet various objectives. Methods using distributed reinforcement learning (DRL) have recently achieved great success with large-scale dynamic scheduling problems. However, most DRL methods require a huge amount of computational time and a large amount of data for the DRL agents for a control policy. Meanwhile, various scheduling experts have already developed several scheduling policies (i.e., dispatching rules) that can successfully fulfill different objectives with acceptable performance for a processing system based on their understandings of the processing systems characteristics. In this paper, we propose to learn from experts to reduce the learning and searching costs of a good policy in large-scale dynamic scheduling problems. In the learning process, our DRL agents select the experts who have better performance in the scheduling environment, observe the experts’ actions, and learn a scheduling policy guided by these experts’ demonstrations. Our realistic simulations results demonstrate that this expert-guided DRL (EGDRL) approach outperforms DRL methods without expert guidance, as well as some other reinforcement learning from demonstration (RLfD)methods in several systems. To the best of our knowledge, our research is one of the first works that incorporates existing expert policies to guide the learning of optimal policies for large-scale dynamic scheduling problems.