Practice of Efficient Data Collection via Crowdsourcing: Aggregation, Incremental Relabelling, and Pricing
In this tutorial, we present a portion of unique industry experience in efficient data labelling via crowdsourcing shared by both leading researchers and engineers from Yandex. We will make an introduction to data labelling via public crowdsourcing marketplaces and will present key components of efficient label collection. This will be followed by a practice session, where participants will choose one of the real label collection tasks, experiment with selecting settings for the labelling process, and launch their label collection project on Yandex.Toloka, one of the largest crowdsourcing marketplaces. The projects will be run on real crowds within the tutorial session. Finally, participants will receive a feedback about their projects and practical advice to make them more efficient. We expect that our tutorial will address an audience with a wide range of background and interests. We do not require specific prerequisite knowledge or skills. We invite beginners, advanced specialists, and researchers to learn how to efficiently collect labelled data.