There are currently only a few situations in which doctors can prescribe a treatment plan based on the specific genetic mutations in a patient's cancer cells. That is expected to change as projects like TCGA, TARGET, and CGCI yield a comprehensive catalog that researchers can use to find new targets for medicines and discover clues to improve patient outcomes. But there is an urgent need for an efficient and user-friendly portal to give researchers access to the data. The NCI genome projects are producing staggering amounts of data.
"The scale of this is far beyond anything faced in medical research before," Haussler said. "Each genome file, the DNA record from a tumor or normal tissue, is 300 billion bytes. And for every case there are two of these files, the cancer genome and the normal genome. Add to this RNA sequence data, and the prospect of deeper sequencing in the future, and we must plan for up to a terabyte (1,000 billion bytes) for each case."
TCGA currently generates about 10 terabytes of data each month. For comparison, the Hubble Space Telescope amassed about 45 terabytes of data in its first 20 years of operation. TCGA's output will increase tenfold or more over the next two years. Over the next four years, if the project produces a terabyte of DNA and RNA data from each of more than 10,000 patients, it will have produced 10 petabytes of data (a petabyte is 1,000 terabytes). And TCGA is jus
|Contact: Tim Stephens|
University of California - Santa Cruz