The major differences are found not in how the organisms are grouped, but in the relative position of these groups in the organism trees, he said.
Finally, Kim and his colleagues analyzed the genomes of several hundred viruses, including several that could not be classified.
"Some viruses have no or few highly conserved common genes to other viruses, thus, the gene alignment-based method cannot find relationship among such groups, but we think we can," he said.
Because of the vast amount of whole genome sequence data, all of Kim's analyses monopolized a computer cluster of 320 CPUs (central processing units) for over a year.
Kim stressed the major difference between FFP and gene-centric comparisons of genomes: FFP takes into account all or most of the DNA or protein sequences in the genome, while gene alignment analysis chooses a small set of genes out of large number of genes in each organisms, and uses that to represent the organism.
"The fallacy of the view that organisms can be represented by a small set of their genes is really due to our prejudice that genes are us," Kim said. "We know now, more and more, that this is oversimplification.
"It is likely that some of the observations we come up with will turn out to be wrong, but the method will evolve and get better and better as experts come in and tell us where we have gone wrong. The math is there, now we have to remove the human bias as much as possible."
In addition to applying the method to comparative genomics, Kim expects it will help in grouping and finding relationships among sets of other information, such as electronic information encoding text, sounds and images. It may also help in tracing human ancestry and disease demography using whole genome sequences, and in grouping of metagenomic data - the sequences of genome fragments from many organisms, mo
|Contact: Robert Sanders|
University of California - Berkeley