A comparative study of state-of-the-art speech recognition models for English and Dutch
2020
The advent of deep learning methods has led to significant improvements in speech recognition.
As a result, companies are concentrating more on taking advantage of these achievements and try to utilize speech recognition in their business lines.
However, data preprocessing, size of the dataset and the architecture of the deep learning model may have a huge impact on the accuracy of speech recognition and the best combination of these arrangements is still unknown in different contexts. Therefore, in this study, we aimed
to figure out whether it is possible and beneficial for companies to put an effort and use these methods on relatively small datasets and in a language other than English (i.e. Dutch).
In order to find the answer, we present a comparative study on two state-of-the-art speech recognition architectures on small datasets to examine the practicality and scalability of these academically well-received architectures in real-life with all its limitations and. We conducted a series of experiments to train different network architectures on
different datasets. We realized that with the same data preprocessing and without using any language model, the listen, attend and spell (LAS) model on both English and Dutch datasets outperforms the CNN-BLSTM model. Comparing
the gained results in this research to the previously reported results of the LAS model, it can be deduced that the size of the dataset is influential on the accuracy of speech recognition systems
Keywords:
- Correction
- Cite
- Save
- Machine Reading By IdeaReader
0
References
1
Citations
NaN
KQI