Performance comparison of sequential and parallel compression applications for DNA raw data

2016 
We present an experimental performance comparison of lossless compression programs for DNA raw data in FASTQ format files. General-purpose (PBZIP2, P7ZIP and PIGZ) and domain-specific compressors (SCALCE, QUIP, FASTQZ and DSRC) were analyzed in terms of compression ratio, execution speed, parallel scalability and memory consumption. Results showed that domain-specific tools increased the compression ratios up to 70 %, while reducing the runtime of general-purpose tools up to \(7\times \) during compression and up to \(3\times \) during decompression. Parallelism scaled performance up to \(13\times \) when using 20 threads. Our analysis indicates that QUIP, DSRC and PBZIP2 are the best tools in their respective categories, with acceptable memory requirements. Nevertheless, the end user must consider the features of available hardware and define the priorities among its optimization objectives (compression ratio, runtime during compression or decompression, scalability, etc.) to properly select the best application for each particular scenario.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    55
    References
    5
    Citations
    NaN
    KQI
    []