Dynamic scheduling in modern processing systems using expert-guided distributed reinforcement learning
In the environment of modern processing systems, one topic of great interest is how to optimally schedule (i.e., allocate) jobs with different requirements for the systems to meet various objectives. Methods using distributed reinforcement learning (DRL) have recently achieved great success with large-scale dynamic scheduling problems. However, most DRL methods require a huge amount of computational time and a large amount of data for the DRL agents for a control policy. Meanwhile, various scheduling experts have already developed several scheduling policies (i.e., dispatching rules) that can successfully fulfill different objectives with acceptable performance for a processing system based on their understandings of the processing systems characteristics. In this paper, we propose to learn from experts to reduce the learning and searching costs of a good policy in large-scale dynamic scheduling problems. In the learning process, our DRL agents select the experts who have better performance in the scheduling environment, observe the experts' actions, and learn a scheduling policy guided by these experts' demonstrations. Our realistic simulations results demonstrate that this expert-guided DRL (EGDRL) approach outperforms DRL methods without expert guidance, as well as some other reinforcement learning from demonstration (RLfD)methods in several systems. To the best of our knowledge, our research is one of the first works that incorporates existing expert policies to guide the learning of optimal policies for large-scale dynamic scheduling problems.