TETRIS: TilE-matching The TRemendous Irregular Sparsity

Authors:
Yu Ji Tsinghua University
Ling Liang UCSB
Lei Deng UCSB
Youyang Zhang Tsinghua University
Youhui Zhang Tsinghua University
Yuan Xie UCSB

Introduction:

Compressing neural networks by pruning weights with small magnitudes can significantly reduce the computation and storage cost.

Abstract:

Compressing neural networks by pruning weights with small magnitudes can significantly reduce the computation and storage cost. Although pruning makes the model smaller, it is difficult to get practical speedup in modern computing platforms such as CPU and GPU due to the irregularity. Structural pruning has attract a lot of research interest to make sparsity hardware-friendly. Increasing the sparsity granularity can lead to better hardware utilization, but it will compromise the sparsity for maintaining accuracy.

You may want to know: