Data Miners
The class typically begins with students claiming their own "fosmids," or chunks of raw DNA sequence, from the GEP website, which acts as an organizational hub for publicly available data. A DNA sequence is a succession of letters corresponding to the primary structure of a real strand of DNA -- in this case, Drosophila (fruit fly) DNA.
Drosophila is one of the most commonly studied model organisms in biology. "Understanding how its genes are organized and function will help us to understand how human genes function," Elgin explains.
In the first part of the course, students work to improve the quality of their chunk of DNA. This process, known as "finishing," is necessary because raw sequence data often has problem areas that can only be corrected by hand. Using specialized software, students identify gaps, potential assembly errors, and low quality regions in the sequence data for their fosmid. Students then design and order additional sequencing reactions that will generate the data needed to remedy these problem areas. Weekly orders are processed simultaneously (and hence cost-effectively) at Washington University's Genome Sequencing Center. Students use the resulting sequence data to polish their fosmids to high quality standards.
The second component of the course is annotation, the construction of "gene models" that distinguish coding regions of the DNA from noncoding regions. In eukaryotes such as humans and fruit flies, only a small percentage of the genome contains instructions for making proteins. Elgin explains, "It's as though someone has given you Moby Dick, but they've actually given it to you in twenty volumes because they've interspersed gibberish into the text at random places
'/>"/>
| Contact: Sarah Elgin selgin@wustl.edu 314-935-5348 Washington University in St. Louis Source:Eurekalert |