Under CopyrightKoner, RajatRajatKonerSinhamahapatra, PoulamiPoulamiSinhamahapatraRoscher, KarstenKarstenRoscherGünnemann, StephanStephanGünnemannTresp, VolkerVolkerTresp2022-03-1523.12.20212021https://publica.fraunhofer.de/handle/publica/41334110.24406/publica-fhg-413341A serious problem in image classification is that a trained model might perform well for input data that originates from the same distribution as the data available for model training, but performs much worse for out-of-distribution (OOD) samples. In real-world safety-critical applications, in particular, it is important to be aware if a new data point is OOD. To date, OOD detection is typically addressed using either confidence scores, auto-encoder based reconstruction, or contrastive learning. However, the global image context has not yet been explored to discriminate the non-local objectness between in-distribution and OOD samples. This paper proposes a first-of-its-kind OOD detection architecture named OODformer that leverages the contextualization capabilities of the transformer. Incorporating the transformer as the principal feature extractor allows us to exploit the object concepts and their discriminatory attributes along with their co-occurrence via visual attention. Based on contextualised embedding, we demonstrate OOD detection using both class-conditioned latent space similarity and a network confidence score. Our approach shows improved generalizability across various datasets. We have achieved a new state-of-the-art result on CIFAR-10/-100 and ImageNet30.enout of distributionOODdetection architecturevisual attentioncontextualised embeddingOODformer: Out-Of-Distribution Detection Transformerpresentation