Results: We have developed a method for automatically extracting the biological process functions of genes/protein/families based on Gene Ontology (GO) from text using a shallow parser and sentence structure analysis techniques. When the gene/protein/family names and their functions are described in ACTOR (doer of action) and OBJECT (receiver of action) relationships, the corresponding GO-IDs are assigned to the genes/proteins/families. The gene/protein/family names are recognized using the gene/protein/family name dictionaries developed by our group. To achieve wide recognition of the gene/protein/family functions, we semi-automatically gather functional terms based on GO using co-occurrence, collocation similarities and rule-based techniques. A preliminary experiment demonstrated that our method has an estimated recall of 54?4% with a precision of 91?4% for actually described functions in abstracts. When applied to the PUBMED, it extracted over 190 000 gene–GO relationships and 150 000 family–GO relationships for major eukaryotes.
Availability: The extracted gene functions are available at http://prime.ontology.ims.u-tokyo.ac.jp