The Role of Annotation Fusion Methods in the Study of Human-Reported Emotion Experience During Music Listening

2020 
Music is a universally-enjoyed art form, but listeners often respond to it in tremendously different ways. The same song can bring one person great joy and another deep sorrow. This paper focuses on modeling human music experience at the group level. In this scenario, human annotations serve an important role in computational modeling, especially where the target constructs under study are hidden, such as dimensions of emotion or enjoyment to music listening. In this work, we investigate several ways to represent aggregate human annotations of the complex, subjective emotional experience of listening to music. We show the utility of several methods for fusing self-reported emotion and enjoyment ratings by predicting these responses with auditory features. Using traditional methods such as time alignment with simple averaging and Dynamic Time Warping, as well as state-of-the-art methods based on Expectation Maximization and Triplet Embeddings, we show that it is possible to accurately represent hidden constructs in time under noisy sampling conditions, evidenced by better performance on behavioral response predictions. That subjective responses to complex musical stimuli can be accurately captured using these methods suggests more general applications to research in areas such as affective computing and music perception.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []