All of the HapMap data is deposited in NCBI and was made available to the California researchers for their computation, along with more than a dozen other data sets, including the second-largest behind HapMap: 110 million genotypes published earlier this year by a consortium led by Perlegen Sciences.
"The speed with which we are able to compute the entire dbSNP database of genotypes is a combination of the speed of our algorithm and the computational resources that allowed us to do it so quickly," explained Eskin, a professor in UCSD's Jacobs School of Engineering. "We have demonstrated that haplotype phasing can be done routinely every time there is a new release of data deposited in the NCBI database."
"By reducing the waiting time to just 24 hours, NCBI can make it an integral part of the build cycle for dbSNP," said NCBI's Stephen Sherry. "Every time there is a new release of polymorphism and human variation information in our database, our colleagues in California will be able to re-compute the haplotypes and tag SNPs." To underscore that point, in early October the researchers ran another complete computation on an updated version of the NCBI database that has not yet been made public.
ICSI's Halperin notes that working with the entire dbSNP database showed that HAP works well on diverse data sets. "The challenge of analyzing such a large dataset is enormous, since the integration of the different datasets is not a simple task," explained the research scientist. "In particular, different data sets have different characteristics, and one has to take this into account. This project demonstrates the ability of HAP to eff
Source:University of California - San Diego