Simplifying Data Traffic Classification with Byte Importance Distillation

2021 
Network traffic classification (NTC) plays an important role in network measurement and management. Recently, Artificial Intelligence (AI) based NTC has been widely considered a good candidate because of its accuracy in processing both clear and encrypted data traffic. However, those AI-based NTC schemes usually apply full-length packets, e.g., through padding. Such lengthy inputs can lead to complex designs of NTC models, which can be challenging to network devices. To tackle this issue, we propose a byte importance distillation scheme to extract the packet payload bytes that contribute the most to traffic classification results. In the proposed scheme, a Bayesian Multilayer Perception (BMLP) is first initialized based on full-size data packets. The byte importance is then defined as the corresponding mean absolute weights in the first layer of a BMLP model with K-integration. By choosing the important bytes, input data packets can be reduced dramatically to a few bytes that contribute the most to the classification output. Two popular AI-based NTC models, i.e., MLP based and CNN based, are implemented to evaluate the proposed byte importance distillation scheme. The results demonstrate that the data packets optimized from the proposed scheme can speed up the MLP and CNN based NTC models by one to two magnitudes, depending on the implementation platforms, while maintaining high classification accuracy. In comparison, packets that are reduced to the same size through traditional dimension reduction approaches such as principal component analysis and convolutional block attention module cannot maintain high classification accuracy of the AI-based NTCs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []