Instead of trying to see proteins directly, they looked for their shadows or 'footprints' on the DNA. To accomplish this, they again turned to the DNaseI enzyme that snips the DNA backbone within regulatory DNA. Prior work had shown that DNaseI likes to snip DNA next to regulatory protein docking sites, but not within the docking site itself. By using next-generation DNA sequencing technology, the researchers analyzed hundreds of millions of DNA backbone breaks made when cells were treated with DNaseI. They then used a powerful computer to resolve millions of protein footprints. In total, they identified 8.4 million such footprints along the genome, some of which were detected in many cell types. Next, they compiled all of the short DNA sequences to which the proteins were docked. They analyzed them using a software algorithm that required hundreds of microprocessors working simultaneously. This revealed that more than 90 percent of the protein docking sites were actually slight variants of 683 different DNA words -- essentially a dictionary of the genome's programming language.
"These findings significantly advance the understanding of how the instructions for controlling genes are written and organized throughout the genome, and how combinations of different instruction sets function together to control genes, often at great distance along the genome," Stamatoyannopoulos said. "The broad spectrum of cell and tissue types included in these analyses provide an incredibly rich resource that can be mined immediately by researchers around the world to illuminate
|Contact: Leila Gray|
University of Washington