Lossy and lossless data compression of data from high energy physics experiments

2012 
This dissertation describes a data compression system optimized for high energy particle detectors. The aim is to reduce the data in the front-end electronics installed in different kind of particle detectors as used for example along the Large Hadron Collider at CERN. The signals collected and digitized from calorimeters, time projection chambers and in general from all detectors with a large number of sensors produce an extensive amount of data, which need to be reduced. Real-time data compression algorithms applied right at the detector front-end is able to reduce the amount of data to be transmitted and stored as early as possible in the data chain. Different lossless and lossy compression methods are analyzed regarding their efficiency in compressing data from particle detectors that produce signals amplified and/or shaped to various waveforms. In addition, the complexity of the algorithms is evaluated to determine their suitability for a real-time hardware implementation. From the analyzed methods, a new developed lossless compression method turned out to be the best suitable one for the implementation in high energy physics applications. The detector data are used to search for rare particle physics phenomena, which makes it crucial that the compression method retains the important information in the data with an appropriate accuracy. Considering the importance of not distorting detector data, a lossless compression method was preferred instead of a lossy method. To go beyond what can be achieved by conventional lossless compression schemes, which are mostly limited by the intrinsic entropy of the underlying data, the proposed compression method makes use of a new scheme where entire vectors of samples are compressed instead of handling the data from the ADCs as individual uncorrelated samples. Our method works by first approximating the incoming vectors, formed by the digitization of the shaped input waveforms from the detector signals, with a set of digitized reference vectors. This is generally known as vector quantization. To prevent information loss the differences between the incoming vector and the best matching reference vector are retained. These differences are then Huffman encoded to obtain the compression. The performance of the compression method was first evaluated by modeling the algorithm in Matlab and using test-data measured with the time projection chamber in the ALICE experiment at CERN. A compression ratio of 50% has been achieved for this test-data (better as the intrinsic entropy of the test-data of 62%). For a demonstration of the functionality of the developed compression method in hardware, a digital IP block was realized and modeled using the hardware description language Verilog. The compression algorithm was optimized for the data from a time projection chamber and tested using a FPGA development board applying the same test-data from the ALICE time projection chamber as used previously in Matlab. The implementation achieved almost the same compression ratio. In this thesis, I show that a data compression of digitized detector data is possible to be realized in the detector front-end electronics very close to the data source by still maintaining the accuracy of the data. The developed and realized lossless compression algorithm achieved a compression ratio of about 50%. The hardware implementation of the algorithm proved its real-time suitability by compressing 10 000 consecutive input signals without introducing dead time. Only an average latency of about 30 clock cycles of 40MHz has to be accepted. The designed data compression IP block is available for an implementation in current and future detector front-end electronics either inside FPGAs or inside full custom ASICs. The compression block requires about 2 700 logic slices inside a Virtex-4 FPGA and around 12 200 gates for an ASIC implementation without taking into account the required memory of 7 kbyte.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    2
    Citations
    NaN
    KQI
    []