A Modular AI Testing Framework for Trustworthy AI: Proof-of-Concept Implementation

Pintz, Maximilian Alexander; Becker, Daniel; Mock, Michael

doi:10.1007/978-3-032-02018-5_33

2025

Conference Paper

Abstract

While independent and reproducible software testing is widely established in safety-critical systems and also supported by development and testing infrastructures, there is still no adequate counterpart for testing of AI systems. In contrast, current AI tests tend to be tightly integrated into the development framework and are not modular in the sense that testing code and system-under-test (SUT) are strictly separable in terms of their software environments. In this paper, we present an AI testing framework for trustworthy AI that aims to support independent, reproducible and auditable AI testing by providing a design-pattern of computational testing workflows, which strongly promotes that individual tests are modular, reproducible, and automatable while maintaining a high-degree of auditablility. To demonstrate the viability and usefulness of this framework, we use it to create a workflow template for the case of metric-based testing of AI models using test datasets and implement a proof-of-concept (PoC) for the specific case of performance tests of visual object detectors. This PoC is publicly available on the AI on demand platform (Demo and code accessible from https://bit.ly/4meYnNo).