Options
2025
Journal Article
Title
3D geometric features based real-time American sign language recognition using PointNet and MLP with MediaPipe hand skeleton detection
Abstract
This paper presents a robust and fast hand gesture recognition method. Utilizing the 21 hand key points predicted by MediaPipe developed by Google, we calculate additional 3D geometric features, including the angles of each finger joint. These geometric features are then subjected to deeper feature extraction using a Multilayer Perceptron (MLP). Simultaneously, these hand key points are normalized in 3D spatial space, and the finger and palm regions are interpolated to generate a point cloud representing the hand skeleton. Subsequently, PointNet is employed for feature extraction from this point cloud. The features extracted by MLP and PointNet are fused, and a classifier is employed to output the probability distribution of hand gestures for each frame of a 3D video stream. Furthermore, a probability machine is designed to analyze the probability sequence, providing the overall gesture prediction. In the experimental phase, we trained a model to classify American Sign Language (English letters and numbers 0 to 9). The results demonstrate an overall accuracy of 99.81 %. The entire method operates in less than 10 milliseconds (using an NVIDIA 2080Ti graphics card), showcasing its real-time detection capabilities.
Author(s)