TK778 : Robust Features Extraction for Speech Recognition in Noisy Environments baxsed on Fractional Fourier Transform
Thesis > Central Library of Shahrood University > Electrical Engineering > PhD > 2020
Authors:
Mohsen Sadeghi [Author], Hosein Marvi[Supervisor], Alireza Ahmadifard[Advisor], Maaruf Ali [Advisor]
Abstarct: Speech is the main source of communication between human beings in order to show their ideas, feelings and thoughts to each other. Speech Recognition Technology allows the computer to receive, interpret, and respond appropriately to human speech commands. Due to the noise existence in real environments, we face the challenge of noncompliance and unequal conditions in both test and train modes for real-world applications. Noise robustness is a broad subject in ASR systems research that has been around for decades and has been researched by many. researchers. In this thesis, first in order to investigate the robustness of formants of voiced frxames of speech in noisy environments with different sources, the amount of displacement of the formants of noisy voiced frxames compared to clean voiced frxames has been measured. it was shown that white noise at all signal-to-noise levels has the greatest impact on the voiced formats of the speech signal. After that, an algorithm was proposed to extract the robust feature for speech recognition. This proposed structure is baxsed on fractional Fourier transform and root function which was named FrRC. For theoretical justification of the proposed method, a mathematical relationship was obtained between the FrRC features of clean speech, noise and noisy speech, and this relationship was compared with the mathematical relationship of the MFCC feature extraction method in different cases. The results of implementation of the speech recognition system baxsed on the FrRC feature extraction method indicate an increase in recognition accuracy compared to other feature extraction methods. An increase of 24.6% and 25.3% of the recognition accuracy compared to LPC and MFCC methods in noisy environment with Babble noise and with signal to noise level of -10dB, respectively, is proof of this claim. In order to increase the accuracy of speech recognition in noisy environments, another algorithm baxsed on Fourier transform and the Power Normalized Cepstral Coefficient method, called Adaptive Fractional Power Normalized Cepstral Coefficient (AFPNCC), was introduced, analyzed and then implemented. In the proposed AFPNCC algorithm, baxsed on the type and intensity of noise, the alpha coefficient of fractional Fourier transform in the algorithm is extracted by the differential evolution optimizer located in the body of the proposed algorithm structure. The results of the implementation of this algorithm show the improvement of speech recognition accuracy in both noisy and clean environments. Numerical results obtained from the simulation of speech recognition system baxsed on AFPNCC feature extraction algorithm also show a 16 and 92% increase in recognition accuracy compared to PNCC and MFCC algorithms in noisy environment with Pink noise and signal to noise level of 5dB, respectively.
Keywords:
#Robust speech recognition #Robust feature extraction #Cepstral features #Fractional Fourier Transform #Differential Evolution optimizer. Keeping place: Central Library of Shahrood University
Visitor: