Loss-based Active Learning for Named Entity Recognition

2021 
This paper addresses the practical issue of lacking training data when building named entity recognition (NER) systems. To this aim, we introduce a new active learning method for reducing the number of training samples required by the underlying NER system. Different from prior work that only focuses on training data, we define a new loss function that when estimating loss and uncertainty scores of training samples for selection, it takes also into account the uncertainty of the $K$ unlabelled test instances most similar to the unlabelled training instances. Experimental results on both general domain and clinical benchmark datasets show that the proposed active learning method allows to train the NER system with between 5% to 7% less training data compared to state of the art uncertainty sampling methods, while retaining high NER effectiveness.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    0
    Citations
    NaN
    KQI
    []