Rhetorical Structure Modeling for Lecture Speech Summarization

2009 
We propose an extractive summarization system with a novel non-generative probabilistic framework for speech summarization. One of the most under-utilized features in extractive summarization is rhetorical information -semantically cohesive units that are hidden in spoken documents. We propose Rhetorical-State Hidden Markov Models (RSHMMs) to automatically decode this underlying structure in speech. We show that RSHMMs give a 68.67% ROUGE-L F-measure, a 6.44% absolute increase in lecture speech summarization performance compared to the baseline system without using RSHMM. We further propose an enhanced Rhetorical-State Hidden Markov Model (RSHMM++) for extracting hierarchical structural summaries from lecture speech. We show that RSHMM++ gives a 72.01% ROUGE-L F-measure, a 3.34% absolute increase in lecture speech summarization performance compared to the baseline system without using rhetorical information. We also propose Relaxed DTW for compiling reference summaries.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []