Protein Annotation as Term Categorization in the Gene Ontology

Karin Verspoor,Judith D. Cohn,Cliff Joslyn,Sue Mniszewski,Andreas Rechtsteiner,Luis Mateus Rocha,Tiago Simas

Protein Annotation as Term Categorization in the Gene Ontology

2004

We addressed BioCreAtIvE Task 2, the problem of annotation of a protein with a node in the Gene Ontology (GO). We approached the task as a problem of categorizing terms derived from the document neighborhood of the given protein in the given document into nodes in the GO based on the lexical overlaps with terms on GO nodes and terms identified as related to those nodes. The system incorporates NLP components such as a morphological normalizer, a named entity recognizer, a statistical term frequency analyzer, and an unsupervised method for expanding words associated with GO ids based on a probability measure that captures word proximity (Rocha, 2002). The categorization methodology uses our novel Gene Ontology Categorizer (GOC) methodology (Joslyn et al. 2004) to select GO nodes as cluster heads for the terms in the input set based on the structure of the GO.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations