Single-index models in the high signal regime

2021 
A single-index model is given by $y = g^{*}(\langle x, \theta ^{*} \rangle) + \epsilon $ : The scalar response $y$ depends on the covariate vector $x$ both through an unknown (vector) parameter ${\theta ^{*}}$ as well as an unknown, non-parametric, univariate link-function $g^{*} \in {\mathcal {G}} $ . We study the problem of recovering the parameter ${\theta ^{*}}$ from i.i.d. samples of the model when the covariates are drawn from a normal distribution. Our focus is on leveraging information about the (known) function class $\mathcal {G}$ in order to design a procedure that adapts to the noise level in the problem, thereby reducing the bias of parameter estimation. We show that when given access to a natural “labeling oracle”, our procedure recovers the underlying parameter at a rate that depends explicitly on how well we are able to estimate a suitably defined “inverse” link function. Both the procedure and its analysis framework are flexible, admitting any black-box estimator for the inverse link function. The resulting rate of parameter estimation significantly improves upon the risk of classical semi-parametric procedures whenever consistent estimates of the inverse link function can be obtained. When the function class $\mathcal {G}$ is appropriately structured and empirical risk minimization is used to estimate the inverse function, we provide quantitative upper bounds on the risk that depend on natural complexity measures of the class of inverse functions. We specialize our framework to the case where ${\mathcal {G}}$ is a sub-class of monotone single-index models, showing a computationally efficient, end-to-end algorithm that achieves very fast rates of parameter estimation in the regime in which the signal-to-noise ratio in the problem is large. We also pay particular attention to parameter identifiability in the noiseless model, deriving sharper upper bounds as well as information-theoretic lower bounds. Consequences for some unimodal SIMs and the (real) phase retrieval problem are also discussed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    76
    References
    1
    Citations
    NaN
    KQI
    []