New data compression schemes are expected to reduce the total storage space needed, so the CGHub repository is designed initially to hold 5 petabytes and to allow further growth as needed. That is still a massive amount of data, and CGHub will need to accommodate transfers of extremely large data files.
Managed by the UCSC team, the CGHub computer system is located at the San Diego Supercomputer Center. It is connected by high-performance national research networks to major centers nationwide that are participating in these projects, including UCSC. Haussler's team designed and oversees the storage and computing infrastructure for the repository, which has an automated query and download interface for large-scale, high-speed use. It will eventually also include an interactive web-based interface to allow researchers to browse and query the system and download custom datasets.
It may take years for cancer genomics research to bring about major changes in cancer care. The first step, and the focus of the NCI cancer genomics programs, is to determine which genomic changes are involved in each type of cancer and to understand the molecular and clinical effects of those changes. Then biomedical researchers must identify or develop treatments to block those effects.
"Right now, cancer research needs something on a very large scale, like the Large Hadron Collider in physics," Haussler said. "Instead of bringing subatomic particles together in high-energy collisions and computing their behavior, we're bringing cancer genomes together in a common database and computing the disease drivers."
CGHub program director is Robert Zimmerman, and project team members include technical director Mark Diekhans; operations manager Linda Rosewood; hardware syste
|Contact: Tim Stephens|
University of California - Santa Cruz