Q13 : Sign Language Recognition using ToF Depth Cameras
Thesis > Central Library of Shahrood University > Computer Engineering > MSc > 2012
Authors:
Ali Reza Manashty [Author], Morteza Zahedi[Supervisor], Prof. Hamid Hassanpour[Advisor]
Abstarct: Sign language recognition is an important challenge yet to be conquered in many areas of its applications. One of the most important problems in sign language recognition is hand segmentation from a scene with complex background. With the introduction of Microsoft Kinect, a depth image is now available along with the RGB image, easing the segmentation process if used correctly. In this research, a real-time finger-spelled alphabet recognition scheme is proposed baxsed on geometric features extracted from the depth-segmented RGB images. A total of 26 geometric features, including 19 Moment-baxsed and seven contour-baxsed features, are extracted from the depth-segmented images after several preprocessing steps. The preprocessing steps include thresholding the depth image to create a mask for hand segmentation, smoothing the mask, translating and resizing it. Bayes classifier is then applied on the feature vector after an analysis using linear discriminant analysis (LDA). Kinect depth-baxsed segmentation of RGB and depth images and proposing a real-time system for accurate classification of the resulting data are considered as the main contributions of this thesis. The proposed method is tested on a vast databaxse with more than 54,000 samples of static finger-spelled American Sign Language (ASL) alphabets covering different variations of hand gestures and complicated backgrounds obtained from Kinect. Several cross-validated recognition results are demonstrated baxsed on different selections of the databaxse, ranging from 78.29% on the whole databaxse including 5 signers with 450 samples per 24 alphabets for each of signers (whole databaxse) and 99.75% on a single signer with 100 samples per alphabet. Performance tests run on the system indicate a real-time processing speed with 38 fps (frxames per second) using a single thread and 245 fps using a multi-thread architecture, both on laptop with a 2GHz Core i7 CPU. It is also manifested in results analysis and validation section that using LDA, a combination of RGB and depth features and samples randomization yields the best recognition results, with the latter one approving the necessity of a training phase for new users just before using the system.
Keywords:
#Sign Language #Depth #Kinect #Hand Segmentation #Finger spelled Link
Keeping place: Central Library of Shahrood University
Visitor: