CPCA: A Feature Semantics Based Crowd Dimension Reduction Framework

2018 
Dimension reduction plays an important role in practical big data analysis and data mining applications. However, popular dimension reduction techniques, such as principal component analysis (PCA), are known to be computation-intensive and are considered as a computation bottleneck for data processing and mining. In this paper, we propose to reduce the computation of PCA via crowdsourcing, a paradigm that accomplishes hard-to-compute problems leveraging collective intelligence. We design CPCA, crowd principal component analysis, a novel crowd-based dimension reduction framework. The CPCA designs tasks for crowd workers to obtain the relations among features based on their semantics and formulates a weighted graph from the collected answers to derive the covariance matrix and the principal components. We prove the correctness of CPCA and conduct extensive evaluations on real datasets. Experimental results show that CPCA could achieve significantly reduction on the computational cost in terms of both time and memory, which lowers the bar for learning.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    0
    Citations
    NaN
    KQI
    []