CC BY 4.0Huber, PatrickPatrickHuberGöhner, UlrichUlrichGöhnerTrapp, MarioMarioTrappZender, JonathanJonathanZenderLichtenberg, RabeaRabeaLichtenberg2025-08-262025-08-262025-08-262025https://publica.fraunhofer.de/handle/publica/494573https://doi.org/10.24406/publica-517410.3390/s2515476910.24406/publica-51742-s2.0-105013311847The response time of Artificial Neural Network (ANN) inference is critical in embedded systems processing sensor data close to the source. This is particularly important in applications such as predictive maintenance, which rely on timely state change predictions. This study enables estimation of model response times based on the underlying platform, highlighting the importance of benchmarking generic ANN applications on edge devices. We analyze the impact of network parameters, activation functions, and single- versus multi-threading on response times. Additionally, potential hardware-related influences, such as clock rate variances, are discussed. The results underline the complexity of task partitioning and scheduling strategies, stressing the need for precise parameter coordination to optimise performance across platforms. This study shows that cutting-edge frameworks do not necessarily perform the required operations automatically for all configurations, which may negatively impact performance. This paper further investigates the influence of network structure on model calibration, quantified using the Expected Calibration Error (ECE), and the limits of potential optimisation opportunities. It also examines the effects of model conversion to Tensorflow Lite (TFLite), highlighting the necessity of considering both performance and calibration when deploying models on embedded systems.enArtificial Neural NetworkANNtensorflow liteembedded systemsTFLitebenchmarkingmodel calibrationresponse timeANN inferenceComprehensive Analysis of Neural Network Inference on Embedded Systems: Response Time, Calibration, and Model Optimisationjournal article