Q177 : Speech Emotion Recognition by Mapping Feature Set to Multi-Dimensional Emotional Space
Thesis > Central Library of Shahrood University > Computer Engineering > PhD > 2020
Authors:
Shadi Langari [Author], Hosein Marvi[Supervisor]
Abstarct: One of the most important issues in human-computer interaction is creating a system that can hear and react like a human. In addition to the text, the speech signal also contains important information and characteristics about the speaker, including age, gender, accent, dialect, health and emotions, and stress of the speaker. One of the areas that can help to achieve this goal is the establishment of spoken communication between humans and machines, as well as the understanding of human emotions by the machine and provide an appropriate response to it. This has led to the design of an Automatic Speech Emotion Recognition System. Most research in this area has been designed and implemented baxsed on the classification of samples into discrete classes or emotionally dimensional space. Therefore, in this dissertation, a mapping between these two spaces is proposed using a suitable set of features. The aim of the proposed approach is to obtain a representation of the feature space that would better capture the essence of the emotional dimensions. The proposed speech emotion recognition system in this dissertation is designed and implemented in two phases. In the first phase, an improved emotion recognition system is proposed using a new adaptive feature extraction method baxsed on time-frequency coefficients and an evolutionary hybrid feature selection method. In the second phase, a mapping is created between the proposed feature set of the first phase and the three-dimensional space of emotions. The proposed model has been evaluated and validated using the Berlin EMO-DB Emotional Speech Databaxse, the SAVEE Audio-Video Databaxse, the PDREC Persian Radio-Drama Emotional Databaxse, and the German Continuous Emotional Speech Databaxse VAM. Experimental results show that the proposed model effectively detects different emotional classes in the EMO-DB databaxse with 97.57% of accuracy, in the SAVEE dataset with 80% of accuracy, and in PDREC with 91.64% of accuracy.
Keywords:
#Emotion Recognition #Speech Processing #Feature Extraction #Feature Selection #Classification #Computer- Human Interaction Keeping place: Central Library of Shahrood University
Visitor: