Using Past Data to Warm Start Active Machine Learning: Does Context Matter?

2021 
Despite the abundance of data generated from students’ activities in virtual learning environments, the use of supervised machine learning in learning analytics is limited by the availability of labeled data, which can be difficult to collect for complex educational constructs. In a previous study, a subfield of machine learning called Active Learning (AL) was explored to improve the data labeling efficiency. AL trains a model and uses it, in parallel, to choose the next data sample to get labeled from a human expert. Due to the complexity of educational constructs and data, AL has suffered from the cold-start problem where the model does not have access to sufficient data yet to choose the best next sample to learn from. In this paper, we explore the use of past data to warm start the AL training process. We also critically examine the implications of differing contexts (urbanicity) in which the past data was collected. To this end, we use authentic affect labels collected through human observations in middle school mathematics classrooms to simulate the development of AL-based detectors of engaged concentration. We experiment with two AL methods (uncertainty sampling, L-MMSE) and random sampling for data selection. Our results suggest that using past data to warm start AL training could be effective for some methods based on the target population's urbanicity. We provide recommendations on the data selection method and the quantity of past data to use when warm starting AL training in the urban and suburban schools.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    1
    Citations
    NaN
    KQI
    []