Bird Audio Diarization with Faster R-CNN

2021 
Birds embody particular phonic and visual traits that distinguish them from 10,000 distinct bird species worldwide. Birds are also perceived to be indicators of biodiversity due to their propensity for responding to changes in their environment. An effective, automatic wildlife monitoring system based on bird bioacoustics, which can support manual classification, can be pivotal for the protection of the environment and endangered species. In modern machine learning, real-life bird audio classification is still considered as an esoteric challenge owing to the convoluted patterns present in bird song, and the complications that arise when numerous bird species are present in a common setting. Existing avian bioacoustic monitoring systems struggle when multiple bird species are present in an audio segment. To overcome these challenges, we propose a novel Faster Region-Based Convolutional Neural Network bird audio diarization system that incorporates object detection in the spectral domain and performs diarization of 50 bird species to effectively tackle the `which bird spoke when?' problem. Benchmark results are presented using the Bird Songs from Europe dataset achieving a Diarization Error Rate of 21.81, Jaccard Error Rate of 20.94 and F1, precision and recall values of 0.85, 0.83 and 0.87 respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []