"The findings of this study compel us to reconsider how the genome is organized and regulated," said Thomas Gingeras, Vice President for Biological Research at Affymetrix and senior author of the Science manuscript. "These data point us toward two critical and exciting questions: What are the functions of these previously unannotated transcripts and what are the regulation schemes that orchestrate such complex assemblies of transcription? It seems certain that this is not the genome we learned about while in school."
In the "traditional" view of the human genome, there are about 26,000 genes used to make proteins that ultimately control the structure and function of every cell in the body. Most disease research has focused on studying these protein-coding genes, even though they make up only about two percent of the human genome sequence.
The new study by Gingeras' team did not make any assumption of what parts of the genome might or might not be important to human biology. They used new Affymetrix tiling microarrays and unique biochemical tests to scan the sequence of 10 human chromosomes -- one third of the human genome sequence -- and found that roughly 15 percent of the DNA sequence analyzed was tran scribed; most sites of transcription were not located in areas associated with protein-coding genes.
The team found many diverse kinds of RNAs transcribed from distinct regions of the genome, creating a complex pool of overlapping transcription. While the team validated many known protein-coding transcripts that contribute to this complex pool, they also discovered that:
Seventy-five percent of all of the RNA that were exclusively found in the nucleus had not been previously detected.
Any single base in the genome can be transcribed into several different transcripts with different, but overlapping sequences. Often, transcripts from one strand of DNA can share parts of their sequence with overlapping transcripts from the same strand or even from the opposite strand (antisense).
Transcripts missing a run of adenosine nucleotides (non-polyA) at the tail-end were twice as common as the more well-studied RNAs that have this sequence. Most transcripts derived from the sparsely transcribed regions between centers of dense transcription are non-polyA transcripts.
This study focused on an in depth scan of 10 chromosomes; however, Affymetrix has developed tiling microarrays that cover all human chromosomes. GeneChip tiling microarrays have been used by Gingeras and his collaborators, as well as the NHGRI publicly funded ENCODE project, to study the human genome in an unbiased fashion -- including regions that have historically been termed coding and non-coding.
Affymetrix is now beginning to commercialize tiling microarrays to give the research community the ability to perform these types of unbiased studies as well. By focusing research beyond the parts of the genome that have been traditionally studied, scientists hope to discover new drug targets, new biomarkers, and a better understanding of disease mechanism.
This project has been funded in whole or in part with Federal Funds from the National Cancer Institute, National Institutes of Health, under Contract number N01-CO-12400.
About Tiling Arrays:
Affymetrix "tiling" arrays mark a shift in the way microarrays are designed and interpreted. By using a neutral approach to array design, tiling arrays include all non-repetitive sequence from a given genome, not just the hand-selected regions that were previously thought to be important. With the inclusion of all genomic sequences, microarrays can now be used as a discovery tool to generate annotations and discover new transcripts. In late 2005, Affymetrix plans to launch high-resolution tiling arrays for the entire human genome and several model organisms, including Drosophila, Arabidopsis, S. cerevisiae and S. pombe.