Multiset Membership Lookup in Large Datasets

2021 
Given a dataset $\cal S$ composed of g subsets with each data items belonging to one of them, $\textit{multiset membership lookup}$ takes an item e as input and outputs a binary answer whether $e\in{\cal S}$ and, in case of yes, the ID of the subset to which e belongs. Overlaid upon while more sophisticated than the canonical membership lookup, multiset membership lookup emerges as a pivotal functionality in many computing and networking paradigms. The quest to achieve high-speed, high-accuracy lookup with limited memory cost makes lookup algorithm design a challenging task, particularly when the data items arrive as a stream. In this paper, we devise compact data structures and lookup algorithms that are amendable for hardware implementation, while guaranteeing high lookup accuracy and supporting interactive query processing. We first propose $\textit{multi-hash color table}$ , a variant of Bloom filter, to encode subset IDs compactly and map the ID of an item to its subset ID. We further construct a more balanced data structure called $\textit{balanced multi-hash color table}$ to improve the compactness by integrating the state-of-the-art load balancing technique. We complete our work by addressing the case of $\textit{batch arrivals}$ and design a batched recording algorithm optimizing the memory efficiency.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []