Variance Reduction in Frequency Estimators via Control Variates Method

2021 
Generating succinct summaries or \textit{sketches} of massive data streams is becoming increasingly important. Such a task typically requires fast, accurate, and small space algorithms in order to support the downstream applications, mainly in areas such as data analysis, machine learning, and data mining. A fundamental and well-studied problem in this context is that of estimating the frequencies of the items appearing in a data stream. The Count-Min-Sketch~\cite{CMS} and Count-Sketch~\cite{count_sketch} are two known classical algorithms for this purpose. However, a limitation of these techniques is that the variance of their estimate tends to be large. In this work, we address this problem and suggest a technique that reduces the variance in their respective estimates, at the cost of little computational overhead. Our technique relies on the classical Control-Variate trick~\cite{Lavenberg} used for reducing variance in Monte-Carlo simulation. We present a theoretical analysis of our proposal by carefully choosing the control variates and complement them with experiments on synthetic as well as real-world datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []