A Serial Sample Selection Framework for Active Learning

2014 
Active Learning is a machine learning and data mining technique that selects the most informative samples for labeling and uses them as training data. It aims to obtain a high performance classifier by labeling as little data as possible from large amount of unlabeled samples, which means sampling strategy is the core issue. Existing approaches either tend to ignore information in unlabeled data and are prone to querying outliers or noise samples, or calculate large amounts of non-informative samples leading to significant computation cost. In order to solve above problems, this paper proposed a serial active learning framework. It first measures uncertainty of unlabeled samples and selects the most uncertain sample set. From which, it further generates the most representative sample set based on the mutual information criterion. Finally, the framework selects the most informative sample from the most representative sample set based on expected error reduction strategy. Experimental results on multiple datasets show that our approach outperforms Random Sampling and the state of the art adaptive active learning method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    2
    Citations
    NaN
    KQI
    []