3D geometric features based real-time American sign language recognition using PointNet and MLP with MediaPipe hand skeleton detection

Zhang, Yan; Notni, Gunther

doi:10.1016/j.measen.2024.101697

2025

Journal Article

Abstract

This paper presents a robust and fast hand gesture recognition method. Utilizing the 21 hand key points predicted by MediaPipe developed by Google, we calculate additional 3D geometric features, including the angles of each finger joint. These geometric features are then subjected to deeper feature extraction using a Multilayer Perceptron (MLP). Simultaneously, these hand key points are normalized in 3D spatial space, and the finger and palm regions are interpolated to generate a point cloud representing the hand skeleton. Subsequently, PointNet is employed for feature extraction from this point cloud. The features extracted by MLP and PointNet are fused, and a classifier is employed to output the probability distribution of hand gestures for each frame of a 3D video stream. Furthermore, a probability machine is designed to analyze the probability sequence, providing the overall gesture prediction. In the experimental phase, we trained a model to classify American Sign Language (English letters and numbers 0 to 9). The results demonstrate an overall accuracy of 99.81 %. The entire method operates in less than 10 milliseconds (using an NVIDIA 2080Ti graphics card), showcasing its real-time detection capabilities.

Author(s)

Zhang, Yan

Technischen Universität Ilmenau

Notni, Gunther

Fraunhofer-Institut für Angewandte Optik und Feinmechanik IOF

Journal

Measurement: sensors

Options

3D geometric features based real-time American sign language recognition using PointNet and MLP with MediaPipe hand skeleton detection