Learning and clean-up in a large scale music database

2007 
We have collected a database of musical features from radio broadcasts and CD collections (N > 10 5 ). The database poses a number of hard modelling challenges including: Segmentation problems and missing and wrong meta-data. We describe our efforts towards cleaning the data using probability density estimation. We train conditional densities for checking the relation between meta-data and music features, and un-conditional densities for spotting unlikely music features. We show that the rejected samples indeed represent various types of problems in the music data. The models may in some cases assist reconstruction of meta-data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    5
    Citations
    NaN
    KQI
    []