Data Amplification: A Unified And Competitive Approach To Property Estimation

Authors:
Yi HAO University of California, San Diego
Alon Orlitsky University of California, San Diego
Ananda Theertha Suresh Google
Yihong Wu Yale University

Introduction:

Estimating properties of discrete distributions is a fundamental problem in statistical learning.The authors design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just 2n samples to achieve the performance attained by the empirical estimator with nsqrt{log n} samples.

Abstract:

Estimating properties of discrete distributions is a fundamental problem in statistical learning. We design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just 2n samples to achieve the performance attained by the empirical estimator with n\sqrt{\log n} samples. This provides off-the-shelf, distribution-independent, ``amplification'' of the amount of data available relative to common-practice estimators.

You may want to know: