Containing Complexity is Key
To understand the barriers to life science data integration, we must consider not only the volume of data now available, but realize the expanding complexity as well. Biological data often does not fit easily into traditional object, data, or relational representations, and so numerous methods have been created to represent the multi-dimensional nested structure of biological data. The result is many, very different representations. Attempting to correlate data between such sources creates a complexity explosion.
Traditional attempts to integrate these complex sources have centered on extraction, transformation and ultimately warehousing the combined data using relational and (or) object databases. This approach was initially useful in specialized applications, but in the general case the source data ill-fits these models, and these techniques become the logical equivalent to forcing a square peg into the round hole.
Further cluttering the landscape, researchers currently depend on proprietary “legacy” systems for warehousing, accessing and using their data. Huge investments have been made in applications that depend on the existing infrastructure, while these applications are constrained by the underlying limitations of the infrastructure. Key to moving forward is the ability to adopt incremental integration and access technology that allows the organization to overcome the limitations inherent in the underlying infrastructure while leveraging the huge investment in their existing research data and applications.
Containing complexity is the key to realizing scalable integration solutions. There is a way to manage this complexity effectively through a shift in the model: by leaving existing sources in their original form, you eliminate the need to convert o