Now scientists at the U.S. Department of Energy's Brookhaven National Laboratory have written a computer program "to sort the informational 'wheat' from the 'chaff,'" said Brookhaven biochemist John Shanklin, who leads the research team. The program, which is described in the open access journal BMC Bioinformatics*, makes comparisons of groups of related proteins and flags individual amino acid positions that are likely to control function.
Biochemists are interested in identifying "active sites" -- regions of proteins that determine their functions -- and learning how these sites differ between paralogs, proteins that have different functions that arose from a common ancestor. The new program, called CPDL for "conserved property difference locator," identifies positions where two related groups of proteins differ either in amino acid identity or in a property such as charge or polarity.
"Experience tells us that such positions are likely to be biologically important for defining the specific functions of the two protein classes," Shanklin said.
When the Brookhaven team used the program to scan three test cases, each consisting of two groups of related but functionally different enzymes, the program consistently identified positions near enzyme active sites that had been previously predicted from structural and or biochemical studies to be important for the enzymes' specificity and/or function."This suggests that CPDL will have broad utility for identifying amino acid residues likely to play a role in distinguishing protein classes," Shanklin said.
Scientists have already used such comparative sequence analysis to id entify protein active sites, and have also used this knowledge to alter enzyme functions by switching particular amino acid residues from one class of enzyme to turn it into the related but functionally different class. But comparing sequences "manually" is labor intensive, error prone, and has become impractical for those who wish to take advantage of the increasing number of sequences in protein databases, Shanklin said.
"Yet this growing data resource contains a wealth of information for structure-function studies and for protein engineering," Shanklin said. "We developed CPDL as a general tool for extracting and displaying relevant functional information from such data sets."
Also, since CPDL does not require that a protein's structure be known -- just its amino acid sequence -- it can be applied to studies of proteins that reside in the cell membrane, for which it is notoriously difficult to determine a molecular structure.