Analysis and modeling for modification of speaking to karaoke-style singing

Soheil Khorram,John H. L. Hansen

Analysis and modeling for modification of speaking to karaoke-style singing

2019

This study considers an analysis of differences in voice modification within the domains of speaking and singing the same text content, resulting in the first attempt in converting speaking to Karaoke-Style singing. To develop this system, we collected a new dataset, UT-Sing dataset, containing more than 23 h of speech from 81 participants across four different languages: English, Farsi, Hindi, and Mandarin. We asked each participant to produce the same text content by first reading and then singing 5 popular songs while listening to the accompanying instrumental music through open-air headphones. This effectively creates a parallel dataset that is suitable for voice modification systems. We first use this dataset to compare different prosodic and spectral characteristics of Karaoke-style singing versus speaking the same text content. We then leverage the knowledge obtained from this comparison to develop a speaking to Karaoke-Style singing modification system. In the training phase of the developed system, we extract acoustic features using the WORLD vocoder; we align the acoustic features through the DTW algorithm. Finally, we train a residual network to model the relationship between the acoustic features. We employ subjective assessments to evaluate the performance of the developed system.This study considers an analysis of differences in voice modification within the domains of speaking and singing the same text content, resulting in the first attempt in converting speaking to Karaoke-Style singing. To develop this system, we collected a new dataset, UT-Sing dataset, containing more than 23 h of speech from 81 participants across four different languages: English, Farsi, Hindi, and Mandarin. We asked each participant to produce the same text content by first reading and then singing 5 popular songs while listening to the accompanying instrumental music through open-air headphones. This effectively creates a parallel dataset that is suitable for voice modification systems. We first use this dataset to compare different prosodic and spectral characteristics of Karaoke-style singing versus speaking the same text content. We then leverage the knowledge obtained from this comparison to develop a speaking to Karaoke-Style singing modification system. In the training phase of the developed syste...

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations