A comparative study of state-of-the-art speech recognition models for English and Dutch

2020 
The advent of deep learning methods has led to significant improvements in speech recognition. As a result, companies are concentrating more on taking advantage of these achievements and try to utilize speech recognition in their business lines. However, data preprocessing, size of the dataset and the architecture of the deep learning model may have a huge impact on the accuracy of speech recognition and the best combination of these arrangements is still unknown in different contexts. Therefore, in this study, we aimed to figure out whether it is possible and beneficial for companies to put an effort and use these methods on relatively small datasets and in a language other than English (i.e. Dutch). In order to find the answer, we present a comparative study on two state-of-the-art speech recognition architectures on small datasets to examine the practicality and scalability of these academically well-received architectures in real-life with all its limitations and. We conducted a series of experiments to train different network architectures on different datasets. We realized that with the same data preprocessing and without using any language model, the listen, attend and spell (LAS) model on both English and Dutch datasets outperforms the CNN-BLSTM model. Comparing the gained results in this research to the previously reported results of the LAS model, it can be deduced that the size of the dataset is influential on the accuracy of speech recognition systems
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []