Automatic data enhancement for language identification using voice generation

2008 
Approaches to LID require very large sets of training (often greater than six hours per language) for accurate results. This study looks at ways of automatically reducing the amount of training data required to train a LID model, while maintaining or increasing accuracy. Initial experiments found that speaker density, i.e., the number of speakers per time unit, had a very dramatic influence on the accuracy of models (absolute increase of 15%). In order to accomplish the goal of increasing the number of speakers available for LID training without having to collect additional audio, the STRAIGHT algorithm was used to synthesize "novel" speakers for use in training language models for a LID system. The mean pitch and vocal tract length of the speaker in each LID training file was scaled to generate four additional voices per original speaker to artifically augment the training data. The resulting models yielded an improvement of 10% over the baseline system (from 80% to 90%). This study shows that automatica...
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []