A genome milestone was reached in 2001 when sequencing of the human genome was completed. This has been followed by complete chemical read-outs of DNA sequence for several species, for example mouse, dog, cow and chicken, in the recent years. But without a code or 'grammar' to reveal the message behind the sequence, the genomic DNA is merely a list of millions and millions of base pairs, A's, C's, G's and T's one after the other.
Based on the universal code by which DNA encodes amino acids, we can make sense of the constantly increasing amout of DNA sequence data as far as it encodes proteins. This code was solved in 1966 and it has allowed researchers to find new genes and estimate the total number of genes in the human genome. However, coding sequence covers only about 1.2% of the human genome. New codes and grammatical rules need to be resolved in order to understand the remaining 98.8% of the genome.
It is evident that genes are expressed in tightly controlled spatial and temporal patterns but we do not know the code by which the expression is regulated. In this post-genomic era, the next big goal is to decipher the genetic code of regulation of gene expression.
At the University of Helsinki the researchers have been interested in sequences which regulate gene expression. The research group, led by professor Jussi Taipale, Ph.D, has defined the binding specificities of several transcription factors. Transcription factors are DNA-binding proteins which are required to activate gene expression. In collaboration biologists and computer scientists designed a software called EEL (enhancer element locator) which searches genomic sequence for regions where many transcription
Source:University of Helsinki