Understanding the complex regulatory mechanisms that tell genes when to switch on and off is one of the toughest challenges facing researchers attempting to discover how life works. "Binding sites," or areas of DNA that interact with the proteins that help control gene expression, can be a long distance on the DNA strand from the genes they influence. Recent research also has shown that gene expression can be controlled by several regulatory proteins working together at a combination of different binding sites.
(Regulatory proteins are known as "transcription factors"; transcription is the first step in the process by which the genetic information in DNA is decoded by the cell to manufacture proteins, the building blocks of life.)
"It's difficult to experimentally observe how transcription factors bind to DNA at a distance from a gene, or how regulation happens," said Fidelis, a computational biologist in Livermore's Biosciences Directorate. "But you can identify their binding sites in a promoter or regulatory region - there are usually a few of these for each gene. We wanted to see if we could somehow deduce how many transcription factors at a time, or combinations of factors, are coming together physically and how these combinations regulate genes."
"To accomplish this," Komorowski said, "we used a machine learning technique called rough sets to mathematically model general rules that could associate known binding sites and gene expression in yeast, which is one of the most widely studied organisms." From the analysis of gene activity under a variety of environm