Today, a team led by the Wellcome Trust Sanger Institute, together with colleagues in the USA and Switzerland, provide a measure of just how important regulatory region variation might be in a pilot study based on some 2% of the human genome. As many as 40 of 374 genes showed alteration in genetic activity that could be related to changes in DNA sequence called SNPs.
"We were amazed at the power of this study to detect associations between SNP variations and gene activity," commented Dr Manolis Dermitzakis, Investigator, Division of Informatics at the Wellcome Trust Sanger Institute. "We were even more amazed at the number of genes affected: more than 10% of our sample ?or perhaps 3000 genes across the genome ?could be subject to modification of activity in human populations due to common genetic variations."
The study combined the map of genetic variation developed through the HapMap with estimates of gene activity obtained from cell cultures from 60 individuals who provided samples for the HapMap. More than 630 genes were studied, of which 374 were active in the cell cultures. If gene activity in a cell culture was skewed from the average, it was investigated further.
These genes were correlated with more than 750,000 SNPs ?sequence differences between individuals in the sample collection. A series of statistical tests were carried out to provide increased confidence in the association between gene activity and sequence variation.
"Our sample size of 60 individuals is relatively small," continued Dr Dermitzakis, "and we might expect not to detect rare variations. However, our pilot project gives us greater confidence to take on a genome-wide survey of gene activity."
A global map of sequence variation and gene activity will be an important tool in the interpretation of variation and disease. Such genome-wide association studies will be able to identify some regions of the genome with strong disease effects.
"The HapMap is proving to be useful in a wide range of applications," commented Dr Panos Deloukas, Senior Investigator, Division of Medical Genetics, Wellcome Trust Sanger Institute. "The journey for our biomedical research is from DNA sequence to individual people and individual disease. The HapMap is a bridge from sequence data to the differences in individuals."
The project focused on three regions of the human genome. The first, called the ENCODE regions, and about 30 million base-pairs of DNA, are being intensively studied around the world as a group of 'typical' human genome regions. The second was 35million base-pairs of chromosome 21 sequence: three copies of chromosome 21 lead to Down Syndrome. The third was a region of chromosome 20 ?10 million base-pairs ?that is known to be associated with diabetes and obesity.
In comparison with gene sequences that contain the instructions to make proteins, regulatory regions that control genes are relatively poorly understood. Their structure is variable and their distance from the genes they control also varies among genes.
New tools are needed in the search of our genome for the sequences that contribute to disease, tools that will harness the massive amounts of DNA information and transform them into information of real biomedical utility. The methods described here, with the power of the HapMap data and the cell cultures available, will speed that transformation.