Integration of the A2C Algorithm for Production Scheduling in a Two-Stage Hybrid Flow Shop Environment
The paper introduces an approach to apply reinforcement learning (RL) for production scheduling in a two-stage hybrid flow shop (THFS) production system. The Advantage-Actor Critic (A2C) method is used to train multiple agents to minimize the total tardiness and makespan of a production program. The two-stage hybrid flow shop scheduling problem is a NP-hard combinatorial optimization problem that describes a production system with two stages, each consisting of a set of parallel machines. Our concept combines a Discrete-Event Simulation with a pre-implemented RL algorithm using Stable Baselines3. Since similar research often lacks concrete implementation information, the configuration of the OpenAI Gym interface and the agent-environment interaction is presented.
Otto von Guericke University Magdeburg