Tracing with Less Data: Active Learning for Classification-Based Traceability Link Recovery

2019 
Previous work has established both the importance and difficulty of establishing and maintaining adequate software traceability. While it has been shown to support essential maintenance and evolution tasks, recovering traceability links between related software artifacts is a time consuming and error prone task. As such, substantial research has been done to reduce this barrier to adoption by at least partially automating traceability link recovery. In particular, recent work has shown that supervised machine learning can be effectively used for automating traceability link recovery, as long as there is sufficient data (i.e., labeled traceability links) to train a classification model. Unfortunately, the amount of data required by these techniques is a serious limitation, given that most software systems rarely have traceability information to begin with. In this paper we address this limitation of previous work and propose an approach based on active learning, which substantially reduces the amount of training data needed by supervised classification approaches for traceability link recovery while maintaining similar performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    3
    Citations
    NaN
    KQI
    []