Billion-Scale Matrix Compression and Multiplication with Implications in Data Mining

2019 
Billion-scale Boolean matrices in the era of big data occupy storage that is measured in 100's of petabytes to zetabytes. The fundamental operation on these matrices for data mining involves multiplication which suffers a significant slow-down as the required data cannot fit in most main memories. In this paper, we propose new algorithms to perform Matrix-Vector and Matrix-Matrix operations directly on compressed Boolean matrices using innovative techniques extended from our previous work on compression. Our extension involves the development of a row-by-row differential compression technique which reduces the overall space requirement and the number of matrix operations. We have provided extensive empirical results on billion-scale Boolean matrices that are Boolean adjacency matrices of web graphs. Our work has significant implications on key problems such as page-ranking and itemset mining that use matrix multiplication.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []