A dual alignment scheme for improved speech-to-singing voice conversion

2017 
In speech-to-singing (STS) voice conversion, the source speech signals from a speaker are used to generate his/her singing voice. Such a process requires accurate detection of boundaries between phonemes and words in the speech signal. The computation and modification of analysis parameters of speech signals with respect to the target musical scores or singing templates, largely depend upon estimation of phoneme durations. In this paper, an improved dual alignment scheme for speech and singing voices in template-based STS (TSTS) systems is proposed. The subsequence dynamic time warping (subDTW) is employed to match source speech to singer's speech in the first pass of dual alignment. We assume that an accurate correspondence between singer's speech and target singing vocals has been established as part of the singing template development. Therefore, once the source speech is aligned with the singer's speech, it is automatically aligned with singing template, that we call the second pass of dual alignment. The proposed scheme delivers a relative reduction of 95.8% in word alignment error, over the baseline dynamic time warping (DTW) approach. Also, it provides a relative improvement of 38.7% in mean opinion scores of synthesized singing voices in subjective studies, over the same baseline. We demonstrate that the proposed dual alignment with the subDTW is effective in STS conversion applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    11
    Citations
    NaN
    KQI
    []