Analysis of Speech and Singing Signals for Temporal Alignment

2018 
Accurate alignment between singing signal and its spoken lyrics at frame-level is imperative to several applications in singing signal processing. As the acoustic characteristics of speech and singing signals differ significantly, finding the temporal alignment between them is not easy. In this paper, we study the characteristics of speech and singing signals to identify their common properties to facilitate temporal alignment. We observe that: (i) the characteristics of excitation source in human voice production mechanism largely vary with speaking and singing and, (ii) for the same linguistic content, speaking and singing signals present very different formant patterns. Based on these observations, we formulate a set of tandem features that represent only those characteristics consistent between speech and singing signals. Such tandem features are used in dynamic time warping for temporal alignment, and in a speech-to-singing conversion experiment. In both objective and subjective evaluations, we show that the proposed tandem features are significantly superior to the baseline features in temporal alignment.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    8
    Citations
    NaN
    KQI
    []