Architectural Proposal for Reproducible, Standardized Deep Learning Research

Lübbering, Max; Shah, Vijul; Chatterjee, Moinam; Priya, Priya; Soliman, Osama Mohamed Abdullah Nasr; Sifa, Rafet

doi:10.1109/ICSA-C65153.2025.00021

2025

Conference Paper

Abstract

The lack of reproducibility of research results is one of the most recurring issues in deep learning (DL) research, with many researchers associating DL research with a reproducibility crisis. We identify technical obstacles due to the architectural design of state-of-the-art (meta) DL frameworks impeding re-producibility. To achieve high reproducibility, most frameworks only provide a set of high-level functions similar to the structure of libraries, forcing researchers to implement boilerplate code from scratch for training and evaluation pipelines. We argue that a well-thought architectural design, leveraging established design paradigms such as the inversion of control paradigm, dependency injection, and strategy pattern, already allows for the maximization of reproducibility, without necessitating mentioned implementational overhead by the user. Our analysis of existing DL frameworks unveils that their lack of reproducibility is often induced by conflicting design decisions which favor code flexibility / hackability. Based on our proposed architectural design and utilization of dedicated design patterns, we propose MLgym, a prototypical PyTorch-based open-source DL framework, maintaining full control over the training and evaluation pipeline, as well as, allowing for the implementation of the reproducibility requirements demanded in various research papers.