Navigation Links
SDSC's 'big data' expertise aiding genomics research

The San Diego Supercomputer Center (SDSC) at the University of California, San Diego, has in the last three years undergone a major reboot, remaking itself into a center of expertise on all aspects of "big data" research including genomics, one of the fastest growing areas of scientific study.

"We have in recent years become a lot more than a supercomputer center," SDSC Director Michael Norman told attendees earlier this month at the third annual X-Gen Congress & Expo, a four-day event focused on exploring the potential of established and emerging genomic technologies. "Our real expertise is now in all aspects of 'big data', which includes data integration, performance modeling, data mining, software development, workflow automation, and more. We believe that data-enabled science is the beginning of a new scientific era."

SDSC's creation of a fully integrated "big data" environment already has led to several projects in the study of genes, and more are underway. "Our focus in genomic medicine is growing," said Norman.

"Next-generation sequencing of DNA and RNA are profoundly transforming biology and medicine, providing insight into our origins and diseases," according to Wayne Pfeiffer, a distinguished scientist at SDSC. "However, obtaining that insight from the sequencer data deluge requires complex software and increasingly powerful computers."

SDSC has an expanding repertoire of "big data" systems, the latest being Gordon, a unique flash memory-based supercomputer that is capable of storing 100,000 entire human genomes, while operating hundreds of times faster than conventional computers to study genetic data.

Genetic data creates many additional requirements regarding sharing and computing. The iDASH center (integrating Data for Analysis, Anonymization, and Sharing), under the leadership of Lucila Ohno-Machado, is the most recent National Center for Biomedical Computing funded under the National Institutes of Health (NIH) Roadmap for Bioinformatics and Computational Biology.

Conceived as a collaborative computational environment to improve access to health data and software, iDASH provides biomedical and behavior scientists with access to a sophisticated, secure privacy-preserving infrastructure to contribute, integrate, and analyze their data, as well as potentially reuse data from others (given permissions set up by data contributors) and leverage other research results.

"The iDASH center addresses fundamental challenges to research progress by providing a secure, privacy-preserving computational environment in which researchers can analyze molecular, clinical, and behavioral data," said Ohno-Machado.

SDSC also has multiple collaborations with the Scripps Translational Science Institute (STSI), which has a dedicated 1gigabit-per-second (Gb/s) network connection to the center, along with 140 terabytes of online project storage. STSI has purchased time on SDSC's Triton Resource to conduct research on a number of projects.

One such collaboration is called the Human Tumor Study, or HuTS, which is using SDSC's Triton Resource to search for genome variants between blood and tumor tissue. Software used in this project includes the Genome Analysis Toolkit (GATK), the SOAPdenovo assembler, and various aligners such as ATAC, BLAT, and BWA.

Another collaboration involving SDSC, STSI, and others is called W115. In this project, Pfeiffer is using the Velvet and ABySS assemblers and the ATAC and BFAST aligners on the Triton Resource to study the full genome sequence of a 115 year-old woman to determine how many mutations occur in a long, healthy lifetime.

Further collaborations between SDSC and other genomic institutions including STSI are expected, said Norman, noting that Gordon and its data storage facilities have the bandwidth needed for such research. "The end goal here is to develop a rapid learning system for guiding individual therapies, and SDSC is now set up assist in reaching that goal," said Norman.

Not Just Supercomputers

In addition to Gordon, which went into production earlier this year, SDSC operates Trestles, designed to enable modest-scale and gateway researchers to be as computationally productive as possible, and the Triton Resource, a medium-sized data-intensive compute cluster primarily for UC San Diego and UC researchers.

All three computer systems, for example, are integrated into four tiers of specialized data storage, which is crucial for genomics and other researchers who need to sift through massive amounts of data. SDSC's data storage facilities include:

  • Data Oasis, a high-performance Lustre-based parallel file system with four petabytes of storage and a 100 gigabyte-per-second (GB/s) connection for scratch and medium-term storage.

  • Gordon's 300 terabytes of flash-based solid state drive memory. Like Data Oasis, this is used for fast random access and fast sequential access.

  • Project Storage, which provides academic and research partners a network-based storage service offering Common Internet File System (CIFS) and Network File System (NFS) storage to SDSC and UC San Diego systems. With transfer rates up to 1GB/s, Project Storage is an excellent option for interactive access and use as a traditional mounted file system.

  • SDSC Cloud, a multi-platform, fully accessible and scalable disk-based cloud storage system with 5.5 petabytes of raw storage and more than two petabytes of formatted dual copy storage for archiving or sharing. One petabyte equals a quadrillion (1,000 trillion) bytes of information. The SDSC Cloud is believed to be the largest academic-based cloud storage system in the U.S.

SDSC's four tiers of specialized storage are all interconnected for use as needed. "One can access any storage from any of these systems and build workflows that hop from one system to another," said Norman.

SDSC's Gordon and Trestles systems and their storage systems are available for use to any researcher or educator at a U.S.-based institution and not-for-profit research through the National Science Foundation's (NSF) Extreme Science and Engineering Discovery Environment, or XSEDE program. Industry-based research time and storage is also available. Industry researchers interested in using SDSC's resources or expertise should contact Ron Hawkins at or 858 534-5045.

The X-Gen Congress & Expo was held March 5-8 in San Diego.


Contact: Jan Zverina
University of California - San Diego

Related biology news :

1. Improved Authentication and Confidentiality Protection. ICAP Patent Brokerage Announces for Auction Important Patents in Data Encryption and Document Security
2. UNH researchers find African farmers need better climate change data to improve farming practices
3. Vitamin B and omega-3 supplementation and cancer: New data
4. Georgetown hosts forum to discuss government request of journals to redact scientific data
5. Verified Clinical Trials System Implemented at Leading Global Contract Research Organization to Improve Participant Safety and Data Integity
6. Transgene insects: Scientists call for more open data
7. Adipose stem cell heart attack trial data published in JACC
8. Ingenuity Systems iReport Now Available: Fast, Accurate, Interactive Report for Gene Expression Data
9. Aria Diagnostics announces publication of first peer-reviewed data for new noninvasive prenatal test
10. German research team targets at risk data on biodiversity
11. Tool enables scientists to uncover patterns in vast data sets
Post Your Comments:
(Date:1/21/2016)... India , January 21, 2016 ... According to a new market research report "Emotion Detection ... and Others), Software Tools (Facial Expression, Voice Recognition ... Regions - Global forecast to 2020", published by ... is expected to reach USD 22.65 Billion by ...
(Date:1/18/2016)... , Jan. 18, 2016  Extenua Inc., a ... simplifies the use and access of ubiquitous on-premise ... partnership with American Cyber.  ... experience leading transformational C4ISR and Cyber initiatives in ... the latest proven technology solutions," said Steve ...
(Date:1/11/2016)... Jan. 11, 2016 Synaptics Incorporated (NASDAQ: ... today announced that its ClearPad ® TouchView ™ ... won two separate categories in the 8 th ... Best Technology Breakthrough. The Synaptics ® TDDI solution ... supply chain, thinner devices, brighter displays and borderless designs. ...
Breaking Biology News(10 mins):
(Date:2/4/2016)... 4, 2016 - New FDA action date of ... New FDA action date of July 22, 2016   ... 22, 2016   - Lifitegrast ... decade indicated for the treatment of signs and symptoms of dry ... the potential to be the only product approved in the U.S. in the past decade ...
(Date:2/4/2016)... , Feb. 4, 2016  CytoSorbents Corporation (NASDAQ: ... commercializing its flagship CytoSorb® blood filter to treat ... around the world, announced that CEO Dr. ... the Source Capital Group,s 2016 Disruptive Growth & ... the company.  Conference Presentation Details: ...
(Date:2/4/2016)... Sinovac Biotech Ltd. ("Sinovac" or the "Company") (NASDAQ: ... China , today announced that the ... February 4, 2016 a preliminary non-binding proposal letter, dated ... V-Ming ( Shanghai ) Investment Holdings Co., ... Shenzhen ) Fund Management Co., Ltd., Beijing ...
(Date:2/3/2016)... , Feb. 3, 2016  Discovery Laboratories, Inc. ... on developing aerosolized KL4 surfactant therapies for respiratory ... has approved an inducement award as a component ... its newly appointed President and Chief Executive Officer.  ... Committee on February 1, 2016 and granted as ...
Breaking Biology Technology: