Speech Perception of Text-To-Speech English Sounds by Japanese High School Students

2018 
The purpose of this study was to investigate whether 21 Japanese high school students were able to notice any difference between Text-To-Speech (TTS) voice rendering and human voices, that is, whether there was any “perception gap,” via a listening test. The TTS voice used was generated by concatenating phonemes of human voices. Some people view TTS voices as “artificial voices,” while others may view TTS voices as a kind of human voice even if they are recorded rather than being directly uttered from human mouths. This conceptual matter is a source of continued disagreement. The results of a listening test revealed that there was no statistically difference among three types of English speech, and the students did not notice that TTS English sounds were artificially synthesized speech sounds produced by personal computer. These findings indicate that TTS voices have a high enough quality for both Japanese high school English as a Foreign Language (EFL) students and teachers to use them as English language listening materials in high school English education in Japan.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []