Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation

Omid Kashefi,Rebecca Hwa

Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation

2020

Omid Kashefi
Rebecca Hwa

Data augmentation has been shown to be effective in providing more training data for machine learning and resulting in more robust classifiers. However, for some problems, there may be multiple augmentation heuristics, and the choices of which one to use may significantly impact the success of the training. In this work, we propose a metric for evaluating augmentation heuristics; specifically, we quantify the extent to which an example is hard to distinguish by considering the difference between the distribution of the augmented samples of different classes. Experimenting with multiple heuristics in two prediction tasks (positive/negative sentiment and verbosity/conciseness) validates our claims by revealing the connection between the distribution difference of different classes and the classification accuracy.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations