Quantum Policy Gradient Algorithm with Optimized Action Decoding

Meyer, NicoNicoMeyerScherer, Daniel DavidDaniel DavidSchererPlinge, AxelAxelPlingeMutschler, ChristopherChristopherMutschlerHartmann, Michael J.Michael J.Hartmann2024-06-132024-06-132023https://publica.fraunhofer.de/handle/publica/4696542-s2.0-85172221319Quantum machine learning implemented by variational quantum circuits (VQCs) is considered a promising concept for the noisy intermediate-scale quantum computing era. Focusing on applications in quantum reinforcement learning, we propose an action decoding procedure for a quantum policy gradient approach. We introduce a quality measure that enables us to optimize the classical post-processing required for action selection, inspired by local and global quantum measurements. The resulting algorithm demonstrates a significant performance improvement in several benchmark environments. With this technique, we successfully execute a full training routine on a 5-qubit hardware device. Our method introduces only negligible classical overhead and has the potential to improve VQC-based algorithms beyond the field of quantum reinforcement learning.enQuantum Policy Gradient Algorithm with Optimized Action Decodingconference paper