Navigation Links
Searching genomic data faster
Date:7/10/2012

CAMBRIDGE, Mass. -- In 2001, the Human Genome Project and Celera Genomics announced that after 10 years of work at a cost of some $400 million, they had completed a draft sequence of the human genome. Today, sequencing a human genome is something that a single researcher can do in a couple of weeks for less than $10,000.

Since 2002, the rate at which genomes can be sequenced has been doubling every four months or so, whereas computing power doubles only every 18 months. Without the advent of new analytic tools, biologists' ability to generate genomic data will soon outstrip their ability to do anything useful with it.

In the latest issue of Nature Biotechnology, MIT and Harvard University researchers describe a new algorithm that drastically reduces the time it takes to find a particular gene sequence in a database of genomes. Moreover, the more genomes it's searching, the greater the speedup it affords, so its advantages will only compound as more data is generated.

In some sense, this is a data-compression algorithm like the one that allows computer users to compress data files into smaller zip files. "You have all this data, and clearly, if you want to store it, what people would naturally do is compress it," says Bonnie Berger, a professor of applied math and computer science at MIT and senior author on the paper. "The problem is that eventually you have to look at it, so you have to decompress it to look at it. But our insight is that if you compress the data in the right way, then you can do your analysis directly on the compressed data. And that increases the speed while maintaining the accuracy of the analyses."

Exploiting redundancy

The researchers' compression scheme exploits the fact that evolution is stingy with good designs. There's a great deal of overlap in the genomes of closely related species, and some overlap even in the genomes of distantly related species: That's why experiments performed on yeast cells can tell us something about human drug reactions.

Berger; her former grad student Michael Baym PhD '09, who's now a visiting scholar in the MIT math department and a postdoc in systems biology at Harvard Medical School; and her current grad student Po-Ru Loh developed a way to mathematically represent the genomes of different species or of different individuals within a species such that the overlapping data is stored only once. A search of multiple genomes can thus concentrate on their differences, saving time.

"If I want to run a computation on my genome, it takes a certain amount of time," Baym explains. "If I then want to run the same computation on your genome, the fact that we're so similar means that I've already done most of the work."

In experiments on a database of 36 yeast genomes, the researchers compared their algorithm to one called BLAST, for Basic Local Alignment Search Tool, one of the most commonly used genomic-search algorithms in biology. In a search for a particular genetic sequence in only 10 of the yeast genomes, the new algorithm was twice as fast as BLAST; but in a search of all 36 genomes, it was four times as fast. That discrepancy will only increase as genomic databases grow larger, Berger explains.

Matchmaking

The new algorithm would be useful in any application where the central question is, as Baym puts it: "I have a sequence; what is it similar to?" Identifying microbes is one example. The new algorithm could help clinicians determine causes of infections, or it could help biologists characterize "microbiomes," collections of microbes found in animal tissue or particular microenvironments; variations in the human microbiome have been implicated in a range of medical conditions. It could be used to characterize the microbes in particularly fertile or infertile soil, and it could even be used in forensics, to determine the geographical origins of physical evidence by its microbial signatures.

Berger's group is currently working to extend the technique to information on proteins and RNA sequences, where it could pay even bigger dividends. Now that the human genome has been mapped, the major questions in biology are what genes are active when, and how the proteins they code for interact. Searches of large databases of biological information are crucial to answering both questions.


'/>"/>
Contact: Caroline McCall, MIT Media Relations
cmccall5@mit.edu
Massachusetts Institute of Technology
Source:Eurekalert

Related biology news :

1. Searching for the origin of muscles
2. EU project: Searching for exotics in the shrimp nets
3. The genomics symposium to boost the further development of cancer research
4. ACRG and BGI report findings from genomics research on recurrent hepatitis B virus integration
5. Researchers announce GenomeSpace environment to connect genomic tools
6. 2012 ACMG Foundation/Signature Genomic Laboratories Travel Award winner announced
7. In search for a vaccine, IU biologist receives $2.3 million to explore chlamydia genomics
8. BGI, University of Helsinki and Wuhan University sign a MOU concerning cooperation on genomics
9. SDSCs big data expertise aiding genomics research
10. Palladium-gold nanoparticles clean TCE a billion times faster than iron filings
11. Exome sequencing gives cheaper, faster diagnosis in heterogeneous disease
Post Your Comments:
*Name:
*Comment:
*Email:
(Date:4/28/2016)... FRANCISCO and BANGALORE, India , ... of EdgeVerve Systems, a product subsidiary of Infosys (NYSE: ... provider, today announced a global partnership that will ... way to use mobile banking and payment services. ... is a key innovation area for financial services, but it ...
(Date:4/15/2016)... April 15, 2016  A new partnership announced ... accurate underwriting decisions in a fraction of the ... priced and high-value life insurance policies to consumers ... With Force Diagnostics, rapid testing (A1C, Cotinine ... readings (blood pressure, weight, pulse, BMI, and activity ...
(Date:3/31/2016)... 2016  Genomics firm Nabsys has completed a financial ... Bready , M.D., who returned to the company in ... leadership team, including Chief Technology Officer, John Oliver ... Nurnberg and Vice President of Software and Informatics, ... Dr. Bready served as CEO of Nabsys from ...
Breaking Biology News(10 mins):
(Date:6/23/2016)... 2016 /PRNewswire/ - FACIT has announced the creation ... biotechnology company, Propellon Therapeutics Inc. ("Propellon" or "the ... a portfolio of first-in-class WDR5 inhibitors for the ... WDR5 represent an exciting class of therapies, possessing ... for cancer patients. Substantial advances have been achieved ...
(Date:6/23/2016)... , June 23, 2016  The Prostate Cancer Foundation (PCF) ... precise treatments and faster cures for prostate cancer. Members of the Class of ... 15 countries. Read More About the Class of 2016 ... ... ...
(Date:6/23/2016)... ... , ... STACS DNA Inc., the sample tracking software company, today announced that ... joined STACS DNA as a Field Application Specialist. , “I am thrilled that ... of STACS DNA. “In further expanding our capacity as a scientific integrator, Hays brings ...
(Date:6/23/2016)... LONDON , June 23, 2016 ... & Hematology Review, 2016;12(1):22-8 http://doi.org/10.17925/OHR.2016.12.01.22 ... Review , the peer-reviewed journal from touchONCOLOGY, ... the escalating cost of cancer care is placing ... a result of expensive biologic therapies. With the ...
Breaking Biology Technology: