July 12, 2012, Hong Kong, China BGI, the world's largest genome sequencing institute, and their publishing partner BioMed Central, a leader in scientific data sharing, announce the launch of a new journal, GigaScience, which publishes large-scale biological research in a unique format. The journal combines standard article publishing with complete data hosting and analysis tools, all of which are open access and freely available.
This launch is a major first step towards revolutionizing the publishing industry with the open access publication of complete, reproducible accounts of all parts of data-intensive scientific research projects. Together GigaScience and its integrated database GigaDB provide scientific analyses, full dataset hosting, and access to the software tools used to conduct these analyses, along with publication of more traditional scientific articles describing the studies.
Having all these together finally allows readers to not only glean the scientific conclusions in the papers, but also to directly test these using the underlying data and analysis tools. By doing this, GigaScience offers a way to help overcome the growing problem of the lack of reproducibility of research. GigaScience publications also include Digital Object Identifier (DOIs) for all datasets in the journal database, GigaDB. This helps make datasets more permanent, as well as fully track-able, discoverable, linkable, and citable, which traditionally has only been possible for journal articles. Citation of data enables scientists, who generate these enormous datasets and share them with the community, to gain more appropriate credit for their contributions to research.
Laurie Goodman, Editor-in-Chief, says, "The full use of large-scale data has sadly lagged far behind our ability to produce it. The leaders of BGI realized they had the ability, given their vast computational resources, to create an innovative new journal format one where enormous datasets could be fully hosted and directly linked to their original scientific studies. By including analysis tools in a data platform, as well as the planned addition of cloud technology later this year, GigaScience can serve as a means to put such data into the hands of researchers who do not have the vast computational resources required for optimal data use. This is in keeping with the goals of our co-publisher BioMed Central, which makes them the perfect partner in achieving this endeavour."
Exemplifying GigaScience and GigaDB's innovative approach to publishing, in the launch edition, is a research article from Stephan Beck's group at the University College London, UK (pre-release version here: http://goo.gl/2nZgD). This article focuses on ways to conduct whole-genome analyses of DNA methylation, an important mechanism that regulates gene expression. The article contains all of the supporting data and software tools needed to recreate the experiments a total of 84 GB freely available for download and reuse from GigaDB. Using BGI's data storage capacity, GigaScience is able to host these and other files, which are far larger than any other journals are able to publish. GigaDB furthermore supports open data by giving up all copyright in published datasets by its use of the Creative Commons CC0 public domain dedication waiver. This allows anyone to access and reuse published data without restrictions.
As well as this innovative, big-data-driven publication format the journal also provides reviews and commentaries that address the many hurdles that still need to be surmounted to improve future big-data handling.
Further highlights from the first issue include:
A compressed future for DNA archiving
Scientists at the EMBL-Bioinformatics Institute in Hinxton, UK, argue that sustainable DNA archiving will depend on the understanding that not all data is createdor preservedequally. They propose a graded system for storing DNA sequences under differing levels of compression based on ease of reproduction of the data and the availability of DNA samples for resequencing.
Digitizing pathogen surveillance
The rapid development of genomics technology and understanding over the last two decades has laid the groundwork for a major advancement in public health. Researchers at Cold Spring Harbor Laboratories and the University of Maryland claim that the time is right for a sequencing-based pathogen surveillance system. They believe that the biggest hurdle for the system would not be the necessary technology, but rather scientific attitudes towards data sharing.
An ambitious plan to digitally characterize ecological diversity
The Genomic Observatories network plans to digitally characterize whole ecosystems of specific 'research hotspots' with the aim of better enabling predictive modeling of biodiversity dynamics. This article, authored by scientists in the US and UK who are a part of the long-term initiative, delineates how collecting and harnessing such a vast body of genetic variation data would greatly benefit both science and broader society.
Closing the issue, Jonathan Eisen also discusses good omes, badomes and how to tell the difference.
|Contact: Laurie Goodman|