Audio-based Depression Screening using Sliding Window Sub-clip Pooling

2020 
Due to the proliferation of voice-enabled smart devices, voice has become a prevalent mode for humans to interact with machines. This availability of voice clips creates the opportunity for leveraging machine learning methods to automate screening of mental health disorders such as depression. However, as human voice contains a variety of signals, depression indicators may not be sustained throughout a conversation. Also, due to privacy concerns, depression labeled voice datasets often have a limited number of participants. To tackle depression screening from voice, we propose Sliding Window Sub-clip Pooling (SWUP), an audio classification method suited for small datasets. SWUP incorporates elements of deep learning methods with traditional machine learning to achieve higher classification accuracy for smaller datasets while still offering the interpretability of traditional models. SWUP focuses on voice sub-clips, thus detecting depression signals as they occur during speech. By dividing parent voice clips into sub-clips, a pooling methodology is utilized to increase the accuracy of the parent clip classification. We apply SWUP to benchmark as well as crowd-sourced datasets achieving a F1 of 0.735 when predicting depression screening survey scores using only the audio modality. SWUP demonstrates the feasibility of machine learning to automate depression screening from voice.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    1
    Citations
    NaN
    KQI
    []