A national consortium of scientists led by BIATECH, a Seattle based non-profit research center, and Pacific Northwest National Laboratory (PNNL), a Department of Energy research institution in Richland, Wash., now suggests a way to put this house in order. They offer a powerful new method that integrates experimental and computational analyses to ascribe function to genes that had been termed "hypothetical" -- sequences that appear in the genome but whose biological purposes were previously unknown.
The method not only portends a way to fill in the blanks in any organism's genome but also to compare the genomes of different organisms and their evolutionary relationship.
The new tools and approaches offer the most-comprehensive-to-date "functional annotation," a way of assigning the mystery sequences biological function and ranking them based on their similarity to genes known to encode proteins. Proteins are the workhorses of the cell, playing a role in everything from energy transport and metabolism to cellular communication.
This new ability to rank hypothetical sequences according to their likelihood to encode proteins "will be vital for any further experimentation and, eventually, for predicting biological function," said Eugene Kolker, president and director of BIATECH, an affiliate scientist at PNNL and lead author of a study in the Feb. 8 Proceedings of the National Academy of Sciences that applies the new annotation method to a strain of the metal-detoxifying bacterium Shewanella oneidensis.
"In a lot of cases," said James K. Fredrickson, a co-author and PNNL chief scientist, "it was not known from the gene sequence if a protei n was even expressed. Now that we have high confidence that many of these hypothetical genes are expressing proteins, we can look for what role these proteins play."
Before this study, nearly 40 percent of the genetic sequences in Shewanella oneidensis--of key interest to DOE for its potential in nuclear and heavy metal waste remediation--were considered as hypothetical. This work identified 538 of these genes that expressed functional proteins and messenger RNA, accounting for a third of the hypothetical genes. They enlisted analytic software to scour public databases and applied expression data to improve gene annotation, identifying similarities to known proteins for 97 percent of these hypothetical proteins. All told, computational and experimental evidence provided functional information for 256 more genes, or 48 percent, but they could confidently assign exact biochemical functions for only 16 proteins, or 3 percent. Finally, they introduced a seven-category system for annotating genomic proteins, ranked according to a functional assignment's precision and confidence.
Kolker said that "a big part of this was the proteomics"-- a systematic screening and identification of proteins, in this case those which were expressed in the microbe when subjected to stress. The proteomic analyses were done by four teams led by Kolker, Carol S. Giometti, Argonne National Laboratory (ANL); John R. Yates III; The Scripps Research Institute (TSRI); and Richard D. Smith, W.R. Wiley Environmental Molecular Sciences Laboratory, based at PNNL. BIATECH's analysis of this data included dealing with more than 2 million files.