Crowdsourcing Practice for Efficient Data Labeling: Aggregation, Incremental Relabeling, and Pricing

2020 
In this tutorial, we present a portion of unique industry experience in efficient data labeling via crowdsourcing shared by both leading researchers and engineers from Yandex. We will make an introduction to data labeling via public crowdsourcing marketplaces and will present the key components of efficient label collection. This will be followed by a practice session, where participants will choose one of the real label collection tasks, experiment with selecting settings for the labeling process, and launch their label collection project on one of the largest crowdsourcing marketplaces. The projects will be run on real crowds within the tutorial session. While the crowd performers are annotating the project set up by the attendees, we will present the major theoretical results in efficient aggregation, incremental relabeling, and dynamic pricing. We will also discuss their strengths and weaknesses as well as applicability to real-world tasks, summarizing our five year-long research and industrial expertise in crowdsourcing. Finally, participants will receive a feedback about their projects and practical advice on how to make them more efficient. We invite beginners, advanced specialists, and researchers to learn how to collect high quality labeled data and do it efficiently.
    • Correction
    • Source
    • Cite
    • Save
    11
    References
    3
    Citations
    NaN
    KQI
    []