Repeating sequences are the hallmark of heterochromatin, and there are several distinct kinds. Simple, short repeats are called satellite DNAs, which tend to become more abundant near the centromeres, adding up to hundreds of thousands or even millions of bases in length. In these "seas" of satellite DNA there are "islands" of moderate-length repeats totaling only tens or hundreds of kilobases, made up of transposons or fragments of transposons.
In other regions of heterochromatin, the transposons constitute the sea. Here the islands are single-copy genes, or lengths of DNA that code for RNAs other than the messenger RNA needed to make proteins, and other functional elements.
Moderately repeating fragments like transposons and single-copy genes were assembled by comparing numerous copies with unique or sufficiently distinctive sequences. The assembly was checked by matching it to clones of longer sequences. With the painstaking manual assembly taken as far as practical, the researchers mapped the sequences to their physical locations on the chromosomes. The sequence and maps prepared the ground for the next stage in the process, the functional analysis of the fly's heterochromatin.
"Historically it was called junk. We set out to see if there was any information in that junk," says Chris Smith, formerly in Berkeley Lab's Life Sciences Division and now an assistant professor of bioinformatics at San Francisco State University. "We used a pipeline of computer programs to analyze the raw sequence data, in search of genes. We identified patterns of codons that might indicate a gene splice-site or a promoter, for example. We mapped experimentally derived evidence of messenger RNAs back to the matching heterochromatin sequence. And we looked for sequences similar to ones already known from protein databases." These standard approaches to finding genes, Smith says, are re
Source:DOE/Lawrence Berkeley National Laboratory