Options
2024
Conference Paper
Title
YawnNet: A Visual-Centric Approach for Yawning Detection
Abstract
Yawning detection is actively used in multimedia applications such as driver fatigue assessment and status monitoring. However, the accuracy and robustness of existing yawning detectors are limited due to variations in environments (especially lights), facial expressions, and confusion behaviours (e.g., talking and eating). This paper introduces a transformer-based method, YawnNet, for accurate yawning detection by leveraging spatial-temporal encoding and local cues. In particular, YawnNet contains a data processing stage with temporal downsampling and cube embedding on the input sequence. Moreover, it includes a Swin-Transformer block that operates on fine-grained patches to uncover short-range local cues. Through comprehensive experiments, we demonstrate the advantages of YawnNet: (1) significantly higher accuracy than the state-of-the-art Dense-LSTM (precision and recall increased by 2.3% and 4.2%, respectively) on the FatigueView dataset, (2) close to real-time (30 FPS on RTX 3090), and (3) a marked improvement in robustness on confusion behaviours, invariance (resolution and orientation) and complex scenarios (occlusion, over- and underexpose).
Author(s)