Clustering Mixed-Type Data with Correlation-Preserving Embedding

Luan Tran,Liyue Fan,Cyrus Shahabi

Clustering Mixed-Type Data with Correlation-Preserving Embedding

2021

Luan Tran
Liyue Fan
Cyrus Shahabi

Mixed-type data that contains both categorical and numerical features is prevalent in many real-world applications. Clustering mixed-type data is challenging, especially because of the complex relationship between categorical and numerical features. Unfortunately, widely adopted encoding methods and existing representation learning algorithms fail to capture these complex relationships. In this paper, we propose a new correlation-preserving embedding framework, COPE, to learn the representation of categorical features in mixed-type data while preserving the correlation between numerical and categorical features. Our extensive experiments with real-world datasets show that COPE generates high-quality representations and outperforms the state-of-the-art clustering algorithms by a wide margin.

Keywords:

mixed type
Correlation
Embedding
Mathematics
Pattern recognition
Cluster analysis
Artificial intelligence

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations