SPOKEN DOCUMENT CLASSIFICATION BASED ON LSH

2013 
We present a novel scheme of spoken document classification based on locality sensitive hash because of its ability of solving the approximate near neighbor search in high dimensional spaces. In speechtext conversion stage, although lattice can provide multi-hypothesis during speech recognition, it is too complex to extract proper word information. Confusion network is adopted to improve word recognition rate while keeping the corresponding posterior probability. In vector space model, modified tfidf on posterior probability is proposed to handle the negative effects of the words with very low posterior probability. Furthermore, after generating the indexing structure based on locality sensitive hash, 1-nearest and N-nearest schemes are adopted in classifier. To spare the execution time, fast locality sensitive hash is conducted. Experiments on the data from four kinds of video programs show the effectiveness of proposed scheme.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    0
    Citations
    NaN
    KQI
    []