TK1080 : I-vector Extraction in Noisy Environments for Speaker Verification Using Time–Frequency Distribution
Thesis > Central Library of Shahrood University > Electrical Engineering > PhD > 2025
Authors:
[Author], Hosein Marvi[Supervisor]
Abstarct: Abstract With the development of technology and technological advancements, speech recognition technology has become one of the main areas of research in recent years, as speech signals contain various types of information. Speaker recognition and speaker verification are two important branches of the speech processing field. However, verifying a speaker’s identity involves several challenges, such as background noise, channel variations, environmental factors, and short-duration speech segments, all of which can interfere with the performance of this process. In general, the performance and efficiency of speaker verification systems are significantly reduced under noisy conditions. A widely used approach in speaker verification systems is the use of identity vectors (i-vectors). Previous studies have shown that i-vectors often contain non-speaker-related information, which can negatively affect system accuracy. Therefore, using time–frequency transformations can help reduce the influence of such unwanted information and improve the overall performance. The objective of this dissertation is to present methods for identity vector extraction baxsed on time–frequency transformations, in order to reduce the influence of non-identity information, particularly noise and channel characteristics, and to enable speaker verification in noisy environments. In this study, a speaker recognition system resistant to hardware deficiencies and noise has been implemented. To improve recognition accuracy, speech features were extracted using various techniques, including the Hilbert transform, Wigner–Ville distribution, discrete wavelet transform, and Gabor transform. Then, a Gaussian Mixture Model (GMM) was employed to model speaker features. Furthermore, normalization and identity vector extraction stages were incorporated into the system to enhance accuracy and robustness. The final model was evaluated using statistical approaches such as model parameter estimation and a final comparison stage. The speaker verification system was evaluated baxsed on error rates in the tests and by plotting the false acceptance rate (FAR), false rejection rate (FRR), and equal error rate (EER) curves. Experiments conducted on the TIMIT databaxse indicate that the proposed methods outperform conventional approaches in the presence of noise and hardware impairments.
Keywords:
#_Speaker verification #non-identity information #identity vector #time-frequency transformations_ Keeping place: Central Library of Shahrood University
Visitor: