High-Performance Tucker Factorization on Heterogeneous Platforms

2019 
Given large-scale multi-dimensional data (e.g., (user, movie, time; rating) for movie recommendations), how can we extract latent concepts/relations of such data? Tensor factorization has been widely used to solve such problems with multi-dimensional data, which are modeled as tensors. However, most tensor factorization algorithms exhibit limited scalability and speed since they require huge memory and heavy computational costs while updating factor matrices. In this paper, we propose GTA, a general framework for Tucker factorization on heterogeneous platforms. GTA performs alternating least squares with a row-wise update rule in a fully parallel way, which significantly reduces memory requirements for updating factor matrices. Furthermore, GTA provides two algorithms: GTA-PART for partially observable tensors and GTA-FULL for fully observable tensors, both of which accelerate the update process using GPUs and CPUs. Experimental results show that GTA exhibits $5.6~44.6\times$ speed-up for large-scale tensors compared to the state-of-the-art. In addition, GTA scales near linearly with the number of GPUs and computing nodes used for experiments.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    16
    Citations
    NaN
    KQI
    []