Quality control of gene predictions

Alinda Nagy,Hedi Hegyi,Krisztina Farkas,Hedvig Tordai,E. Kozma,László Bányai,László Patthy

Quality control of gene predictions

2008

A recent study has systematically compared the performance of various computational methods to predict human protein-coding genes (Guigo et al. 2006). In this study a set of well annotated ENCODE sequences were blind-analyzed with different gene finding programs and the predictions obtained were compared with the annotations. Predictions were analyzed at the nucleotide, exon, transcript and gene levels to evaluate how well they were able to reproduce the annotation. These studies have revealed that none of the strategies produced perfect predictions but prediction methods that rely on mRNA and protein sequences and those that used combined information (including expressed sequence information) were generally the most accurate. The dual-or multiple genome methods were less accurate, although performing better than the single genome ab initio prediction methods. Importantly, at the nucleotide level no prediction method correctly identified greater than ∼90% of nucleotides and at the transcript level (the most stringent criterion) no prediction method correctly identified greater than 45% of the coding transcripts.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations