A community effort to identify and correct mislabeled samples in proteogenomic studies

Seungyeul Yoo,Zhiao Shi,Bo Wen,SoonJye Kho,Renke Pan,Hanying Feng,Hong Chen,Anders Carlsson,Patrik Edén,Weiping Ma,Michael L. Raymer,Ezekiel J. Maier,Zivana Tezak,Elaine Johanson,Denise Hinton,Henry Rodriguez,Jun Zhu,Emily S. Boja,Pei Wang,Bing Zhang

A community effort to identify and correct mislabeled samples in proteogenomic studies

2021

Summary Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations