GhosTEE: An Approach to Solving the GPU-Privacy Trade-off for Machine Learning Inference

CC BY 4.0Aprodu, Andrei-CosminAndrei-CosminAproduMeyer zum Felde, HendrikHendrikMeyer zum FeldeKowatsch, DanielDanielKowatschBöttinger, KonstantinKonstantinBöttinger2026-02-092026-02-092025https://publica.fraunhofer.de/handle/publica/505937https://doi.org/10.24406/publica-743210.1145/3733799.376296210.24406/publica-74322-s2.0-105027203674As Machine Learning undergoes rapid evolution, it becomes the backbone of numerous application domains, making the security of ML models critical. The challenge of ML security is increasingly more complex, considering the inherent vulnerabilities of the models, which an adversary can exploit by either targeting the models themselves or the infrastructure hosting them, the latter being the focus of this work. Few solutions tried incorporating the untrusted GPU into the execution pipeline of the TEE, but often resulted in considerable precision loss or runtime slowdown. Here, we introduce GhosTEE, a framework for GPU-enabled trusted model inference, designed to isolate groups of layers and their weights. As the developer is tasked with choosing which layers to protect, we study the definition of potential layer protection guidelines within the context of Convolutional Neural Networks. To facilitate GPU support for intra-enclave operations, we define a novel memory-efficient matrix masking algorithm for matrix multiplications. After formally proving its numerical stability, we compare it to related approaches, revealing that our method is the most numerically stable solution, which does not require materialization of additional matrices. Finally, we benchmark the performance of our framework with and without the algorithm, observing faster execution speeds for deep 2D convolutions and fully connected layers with large input batches.entrueGPUMachine LearningMatrix Multiplication ComputationProcess-Based TEETrusted InferenceGhosTEE: An Approach to Solving the GPU-Privacy Trade-off for Machine Learning Inferenceconference paper