Fast Retrieval of Large Entries With Incomplete Measurement Data

2022 
In network-wide monitoring, finding the large monitoring data entries is a fundamental network management function. However, the retrieval of large entries is extremely difficult and challenging as a result of incompleteness of network measurement data. Enlightened by tensor model’s strong capability of information representation and extraction, we model the network-wide monitoring data as a 3-way tensor. With tensor completion, the retrieval can be performed after recovering all missing entries. However, this not only incurs an extremely high cost when the tensor is large, but is also unnecessary. Instead, to quickly retrieve large entries at low cost, we transform the large entry retrieving problem to a cosine similarity searching problem, and propose two algorithms: 1) Quickly reordering the factor vectors based on Locality Sensitive Hashing (LSH) hash table so that vectors with small cosine distances are placed in the same hash bucket; 2) Quickly finding the similar vector of a queried one that the two together determine a large entry without incurring the high cost of recovering all entries through the dot products. In the process of LSH table building and similarity query, several novel techniques are proposed, including LSH table representation with the LSH forest, good hash table building to support the flexible search of cosine similarity, and bit-shifting-based quick similarity query. Our experimental studies on 4 real world datasets indicate that our technique is at least up to 60 times faster than the approach based on direct tensor completion.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    0
    Citations
    NaN
    KQI
    []