DZip: improved general-purpose loss less compression based on novel neural network modeling

Mohit Goyal,Kedar Tatwawadi,Shubham Chandak,Idoia Ochoa

DZip: improved general-purpose loss less compression based on novel neural network modeling

2021

We consider lossless compression based on statistical data modeling followed by prediction-based encoding, where an accurate statistical model for the input data leads to substantial improvements in compression. We propose DZip, a general-purpose compressor for sequential data that exploits the well-known modeling capabilities of neural networks (NNs) for prediction, followed by arithmetic coding. DZip uses a novel hybrid architecture based on adaptive and semi-adaptive training. Unlike most NN-based compressors, DZip does not require additional training data and is not restricted to specific data types. The proposed compressor outperforms general-purpose compressors such as Gzip (29% size reduction on average) and 7zip (12% size reduction on average) on a variety of real datasets, achieves near-optimal compression on synthetic datasets, and performs close to specialized compressors for large sequence lengths, without any human input. While the main limitation of NN-based compressors is generally the encoding/decoding speed, we empirically demonstrate that DZip achieves comparable compression ratio to other NN-based compressors while being several times faster. The source code for DZip and links to the datasets are available at https://github.com/mohit1997/Dzip-torch.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations