RECORD: Resource Constrained Semi-Supervised Learning under Distribution Shift

2020 
Semi-supervised learning (SSL) tries to improve performance with the use of massive unlabeled data, which typically works in an offline manner with two assumptions. i) Data distribution is static; ii) Data storage overhead is unlimited. In many online tasks, however, none of the above assumptions is valid. For example, in online image classification, a large amount of unlabeled images increases sharply, which makes it difficult to store them in full; meanwhile, the content of unlabeled images changes constantly, and it is no longer suitable to assume a fixed distribution. We call such a novel setting Resource Constrained SSL under Distribution Shift (or Record for short) and to our best knowledge, it has not been thoroughly studied yet. This paper presents a systemic solution Record consisting of three sub-steps, that is, distribution tracking, sample selection and model updating. Specifically, we propose an effective method to track the distribution changes and locate distribution shifted samples. A novel influence-based approach is used to select the most influential samples for the distribution change based on resource constraints. Finally, we free up memory to put the latest unlabeled data with its pseudo-label for the next distribution tracking. Extensive empirical results confirm the effectiveness of our scheme. In the case of diverse and unknown distribution shifts, our solution is consistently and clearly better than many baseline and SOTA methods along with the memory budget and in some cases it can even approximate the performance of oracle.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    3
    Citations
    NaN
    KQI
    []