Method of Divide-and-Combine in Regularised Generalised Linear Models for Big Data

2016 
When a data set is too big to be analysed entirely once by a single computer, the strategy of divide-and-combine has been the method of choice to overcome the computational hurdle due to its scalability. Although random data partition has been widely adopted, there is lack of clear theoretical justification and practical guidelines to combine results obtained from separate analysis of individual sub-datasets, especially when a regularisation method such as lasso is utilised for variable selection to improve numerical stability. In this paper we develop a new strategy to combine separate lasso-type estimates of regression parameters by the means of the confidence distributions based on bias-corrected estimators. We first establish the approach to the construction of the confidence distribution and then show that the resulting combined estimator enjoys the Fisher's efficiency in the sense of the estimation efficiency achieved by the maximum likelihood estimator from the analysis of full data. Furthermore, using the combined regularised estimator we propose an inference procedure. Extensive simulation studies are presented to evaluate the performance of the proposed methodology with comparisons to the classical meta estimation method and a voting-based variable selection method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    15
    Citations
    NaN
    KQI
    []