Speaker Agnostic Foreground Speech Detection from Audio Recordings in Workplace Settings from Wearable Recorders

Audio-signal acquisition as part of wearable sensing adds an important dimension for applications such as understanding human behaviors. As part of a large study on work place behaviours, we collected audio data from individual hospital staff using custom wearable recorders. The audio features collected were limited to preserve privacy of the interactions in the hospital. A first step towards audio processing is to identify the foreground speech of the person wearing the audio badge. This task is challenging because of the multi-party nature of possible ambulatory interactions, lack of access to speaker information and varying channel and ambient conditions. In this paper, we present a speaker-agnostic approach to foreground detection. We propose a convolutional neural network model to predict foreground regions using a limited set of audio features. We show that these models generalize across the proxy corpora we collected in house to approximately match the deployment environment. The proxy corpora contained full audio and was used as a test-bed to analyze our models in greater detail. We also evaluated the models in the workplace setting to measure speech activity. Our experimental results show promising direction for analyzing workplace behaviors with privacy protected sensing.
    • Correction
    • Source
    • Cite
    • Save