Human-in-the-loop speech-design system and its evaluation

Daichi Kondo,Masanori Morise

Human-in-the-loop speech-design system and its evaluation

2019

We propose human-in-the-loop (HITL) speech-design system with an interface. General text-to-speech (TTS) systems generate the speech waveform from the input text without the need for manual modification. In particular, end-to-end TTS systems can synthesize speech as naturally as human speech. However, it is difficult for users to modify the speech parameters without degrading sound quality. The purpose of this study was to enable collaboration between the user and a deep neural network (DNN) to develop a system with which a user can control the speech parameters without sound-quality degradation. The main problem to be solved is to improve the quality of the speech-parameters generated from the speech parameters designed by the user. We developed several acoustic models with DNNs to meet the purpose of this study. We carried out a subjective evaluation to determine the effectiveness of the proposed system. The subjective score regarding Muffledness improved by using the proposed system compared with speech processed using a TTS system that involves signal-processing without a DNN.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations