Leveraging transfer learning techniques for classifying infant vocalizations

2019 
Infant vocalizations serve various communicative functions and are related to several developmental factors. Different types of vocalizations depict distinct spectro-temporal patterns, which can be recovered and learned using emerging end-to-end machine learning systems. A common problem in such systems is the limited availability of labelled data preventing reliable training. Transfer learning can be used to mitigate this problem by taking advantage of additional data resources relevant to the problem of interest. We propose a transfer learning framework which relies on neural network fine-tuning, and explore various types of architectures, such as a convolutional neural network (CNN) and long-term-short-memory (LSTM) recurrent neural networks with and without an attention mechanism. Our target data come from the Cry Recognition In Early Development (CRIED), while the source data come from three publicly available resources: the Oxford Vocal (OxVoc) Sounds database, the Google AudioSet, and the Freesound repository. Our results indicate that the neural network architectures trained with the proposed transfer learning approach outperform the corresponding networks solely trained on the target data, as well as neural networks pre-trained on large-scale image datasets and adapted to the target data (e.g., VGG16). These suggest the effectiveness of adaptation techniques combined with appropriate publicly available datasets for mitigating the limited availability of labelled data in human-related applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    5
    Citations
    NaN
    KQI
    []