TK429 : Diphone extraction of various modes of speech to improve speech system synthesis
Thesis > Central Library of Shahrood University > Electrical Engineering > MSc > 2015
Authors:
Mohammadreza Haghighat [Author], Hadi Grailu[Supervisor]
Abstarct: In this thesis we propose a new speech synthesis method baxsed on diphones. Our motivation for focusing on diphones is their desired characteristics including their limited numbers and ease and clarity of transition. We aim to produce different types of sentences at the output of the speech synthesizer. Most of existing approaches on Farsi language, consider complicated approaches such as Hidden Markov Model (HMM), decision trees, and Neural Networks (NN). We aim to show that using diphones we will be able to produce different types of sentences with more naturalness. We consider seven different types of diphones with unique characteristics. In our simulations, we first combined diphones extracted from recorded sentences to generate the target sentences. Then we played some sample audios for some audiences and asked them to assign a score, regarding to naturalness and clarity, to compute mean opinion score (MOS) measure. Comparison of our results with that of some conventional approaches, some were baxsed on decision tree and triphones, shows that our proposed method received a higher score on clarity. We also note that there are possible approaches to improve the naturalness of generated sentences. Details on such approaches are discussed in this thesis.
Keywords:
#Farsi text to speech conversion #speech synthesis #diphone #naturalness #MOS score Link
Keeping place: Central Library of Shahrood University
Visitor: