Semantic Linkages of Obsessions: Clustering and Frequencies of Obsessional Symptoms from a Large International Obsessive-Compulsive Disorder Mobile Application Dataset (Preprint)

2020 
BACKGROUND Obsessive-compulsive disorder (OCD) is characterized by recurrent intrusive thoughts, urges, or images (obsessions) and repetitive physical or mental behaviors (compulsions). While specific obsessions and compulsions can manifest in vastly different ways, previous factor analytic and clustering studies suggest the presence of three or four "subtypes" of OCD symptoms. Yet, these studies have relied on predefined symptom checklists, which are limited in breadth and may be biased towards researchers' prior conceptualizations of OCD. OBJECTIVE As an alternative to uncovering potential OCD subtypes, we examined a large data set of freely-reported obsession symptoms obtained from an OCD mobile app. From this we examined data-driven clusters of obsessions based on their latent semantic relationships in the English language, using word embedding, a type of natural language processing. METHODS We extracted free-text entry words describing obsessions in a large sample of users of the mobile application, "NOCD," who self-identified as having OCD. Semantic vector space modeling was applied using Global Vectors for Word Representation algorithm (GloVe), an unsupervised learning algorithm for obtaining vector representations for words based on word-word co-occurrence statistics from a 6 billion word corpus. A domain-specific extension, "Mittens," was also applied to enhance the corpus with OCD-specific words. After cleaning the obsessions words, we created a word co-occurrence matrix. Resulting representations provided linear substructures of the word vector in 100-dimensional space. We applied principal components analysis to the 100-dimensional vector representation of the most frequent words, followed by k-means clustering to obtain clusters of related words. RESULTS We obtained unique 7,001 words representing obsessions from 25,369 individuals. Heuristics for determining optimal numbers of clusters pointed to a three-cluster solution, with themes relating to doubt/checking, contamination/somatic/physical harm/sexual harm, and relationship/just-right. All three clusters showed relatively close semantic relationships to each other in a central area of convergence, with themes relating to harm. An equal-sized split-sample analysis across individuals and a split-sample analysis over time both showed overall stable cluster solutions. Words in the contamination/somatic/physical harm/sexual harm cluster were the most frequently occurring, followed by words in the relationship/just-right cluster. CONCLUSIONS Clustering of naturalistically-acquired obsessional words resulted in three major groupings of semantic themes, which partially overlap with previous studies' results using predefined checklists. Further, the closeness of the overall embedded relationships across clusters and their central convergence on harm suggests that, at least at the level of self-reported obsessional thoughts, the majority of obsessions have close semantic relationships. Harm to self or others may be an underlying organizing theme across many obsessions. Notably, "relationship" themed words, not previously included in factor analytic studies, clustered with "just-right" words. These novel insights have potential implications for understanding how an apparent multitude of obsessional symptoms are connected by underlying themes. This could aid in exposure-based treatment approaches and could be used as a conceptual framework for future research.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    0
    Citations
    NaN
    KQI
    []