• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Distributed online service coordination using deep reinforcement learning
 
  • Details
  • Full
Options
2021
Conference Paper
Title

Distributed online service coordination using deep reinforcement learning

Abstract
Services often consist of multiple chained components such as microservices in a service mesh, or machine learning functions in a pipeline. Providing these services requires online coordination including scaling the service, placing instance of all components in the network, scheduling traffic to these instances, and routing traffic through the network. Optimized service coordination is still a hard problem due to many influencing factors such as rapidly arriving user demands and limited node and link capacity. Existing approaches to solve the problem are often built on rigid models and assumptions, tailored to specific scenarios. If the scenario changes and the assumptions no longer hold, they easily break and require manual adjustments by experts. Novel self-learning approaches using deep reinforcement learning (DRL) are promising but still have limitations as they only address simplified versions of the problem and are typically centralized and thus do not scale to p ractical large-scale networks. To address these issues, we propose a distributed self-learning service coordination approach using DRL. After centralized training, we deploy a distributed DRL agent at each node in the network, making fast coordination decisions locally in parallel with the other nodes. Each agent only observes its direct neighbors and does not need global knowledge. Hence, our approach scales independently from the size of the network. In our extensive evaluation using real-world network topologies and traffic traces, we show that our proposed approach outperforms a state-of-the-art conventional heuristic as well as a centralized DRL approach (60 % higher throughput on average) while requiring less time per online decision (1 ms).
Author(s)
Schneider, S.
Qarawlus, H.
Karl, H.
Mainwork
IEEE 41st International Conference on Distributed Computing Systems, ICDCS 2021  
Conference
International Conference on Distributed Computing Systems (ICDCS) 2021  
DOI
10.1109/ICDCS51616.2021.00058
Language
English
Fraunhofer-Institut für Software- und Systemtechnik ISST  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024