Manning, a co-author on Yooseph’s paper, looked at the other side of the coin. He ran all the public sequences and GOS data against Pfam, a collection of signature profiles for all known protein families. Each of these profiles is an average of all known members of a certain protein family.
"Instead of starting with a human kinase to find a bacterial kinase, for example, you start with all of them together, which makes the search much more sensitive, but also very computationally expensive," Manning says. "We did almost 350 million comparisons, which is probably an order of magnitude or two more than anybody has ever done before."
Manning and co-author Yufeng Zhai, Ph.D., a bioinformatics programmer in the Razavi Newman Center for Bioinformatics at the Salk, could only accomplish this rather gargantuan task with the help of Time Logic, a company in Carlsbad, California. The company specializes in hardware that accelerates genomic searches. "We only have one of their accelerators, but Time Logic stepped up and lent us eight more," says Manning. The final computation took two weeks, but would have taken well over a century on a traditional computer.
The Salk scientists could assign over half of all GOS sequences to known protein families, and discovered that certain protein profiles are more popular in the ocean or on land. For example, gram-positive bacteria are best known for their hardy spores, but this ability has been entirely lo