This technique assumes organisms have genes in common, however, or that these "homologous" genes can be identified. When comparing distantly related species - such as bacteria that live in vastly different environments - this gene-centric method may not work, Kim said.
"What do you do when one gene tells you the organisms are closely related, and another gene tells you they're distantly related?" he asked. "It happens."
Kim, who in the past focused on creating three-dimensional demographic maps of all known protein structures, wanted a technique that could be used to compare genomes of all sizes, and even genomes only partially sequenced. He also wanted a method that would compare all regions of the genome, not just the exons - that is, the DNA transcribed into mRNA, the blueprint for proteins. Exons make up only 1 percent of the human genome, with the remainder being non-coding "introns," regulatory DNA, duplicate or redundant DNA and transposons - genes that have jumped from other places in the genome.
Kim thought that traditional text comparison - used, for example, to assess the authorship of a work of literature or to identify plagiarized text - might provide a model for whole genome comparison and a way to test comparison methods. But while text comparison involves looking at word frequency; genomes cannot be broken down into words.
"I can compare two books in two different ways. I can pick a few sentences, say a hundred that I subjectively decided are important, and compare them, but some are very similar and some very different in the two books," he explained. "So, how can I decide? I need a second method to compare some features representing one whole book to those of the other whole book."<
|Contact: Robert Sanders|
University of California - Berkeley