Latent Tree Models for Hierarchical Topic Detection

2016 
We propose a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTMs). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are discrete latent variables, with those at the second level representing word co-occurrence patterns and those at higher levels representing co-occurrence of patterns at the level below. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. Compared with LDA-based topic models, a key advantage of HLTMs is that they, as graphical models, explicitly model the dependence and independence structure among topics and words, which is conducive to the discovery of meaningful topics and topic hierarchies.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    45
    References
    0
    Citations
    NaN
    KQI
    []