A Multiview Clustering Approach for Mining Authorial Affinities in Literary Texts

2019 
In this work, we investigate the use of multiview learning for the task of authorship attribution. The main goal of this task is to assign authors to texts whose authorship is unknown or disputed. It has gained substantial attention recently because of other applications such as plagiarism detection and forensic investigation. Although the problem is traditionally seen as a supervised learning task, recent works have advocated the use of unsupervised methods as an alternative. The main argument for such an approach is that, by placing a text with a disputed or unknown author in a cluster of works from another author or group of authors, the method is revealing authorial affinities due to stylistic similarities that may be better used by domain experts. Nonetheless, there is no consensus in the literature on what set of features should be used to determine these stylistic similarities. Since the nature of the features may vary drastically, e.g. word frequencies (lexical) versus part-of-speech tags (syntactic), we adopt an agnostic view on which is the best, and, instead, believe that each set of features provides relevant, if not complementary, perspectives on the writing styles of the authors. In this sense, we investigate the use of multiview unsupervised learning for the task of authorship attribution. We use a real-world traditional corpus in authorship attribution research to assess the performance of our approach. Our experiments with the corpus containing plays from different authors from the Shakespeare Era indicate significant improvement compared to the ordinary single-view clustering approach.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    2
    Citations
    NaN
    KQI
    []