Multi-Label Inference For Crowdsourcing

Authors:
Jing Zhang Nanjing University of Science and Technology
Xindong Wu University of Louisiana at Lafayette

Introduction:

This paper studies multi-class multi-label annotation. The authors propose a novel probabilistic method, which includes a multi-class multi-label dependency (MCMLD) model, to address this problem.

Abstract:

When acquiring labels from crowdsourcing platforms, a task may be designed to include multiple labels and the values of each label may belong to a set of various distinct options, which is the so-called multi-class multi-label annotation. To improve the quality of labels, one task is independently completed by a group of heterogeneous crowdsourced workers. Then, the true values of the multiple labels of each task are inferred from these repeated noisy labels. In this paper, we propose a novel probabilistic method, which includes a multi-class multi-label dependency (MCMLD) model, to address this problem. The proposed method assumes that the label-correlation exists in both unknown true labels and noisy crowdsourced labels. Thus, it introduces a mixture of multiple independently multinoulli distributions to capture the correlation among the labels. Finally, the unknown true values of the multiple labels of each task, together with a set of confusion matrices modeling the reliability of the workers, can be jointly inferred through an EM algorithm. Experiments with three simulated typical crowdsourcing scenarios and a real-world dataset consistently show that our proposed MCMLD method significantly outperforms several competitive alternatives. Furthermore, if the labels are strongly correlated, the advantage of MCMLD will be more remarkable.

You may want to know: