Evaluating Hierarchical Clustering Methods for Corpora with Chronological Order

2021 
Hierarchical clustering can traditionally be represented through a dendrogram: a rooted tree whose leaves are documents, the length of the path between two leaves representing the stylistic/linguistic distance between the documents. Clusters correspond to branching nodes: the shorter the distance between two nodes, the more they are expected to share stylistic and linguistic features. We wonder how much the resulting dendrogram is consistent with the chronological order of writing. Indeed, this would provide us with a method of evaluating the result of the clustering. More precisely, the question we want to answer is: can the branching nodes of the dendrogram be re-ordered so that its leaves follow a chronological order as best as possible, while of course preserving the structure of the dendrogram?
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []