BioGrid explainedThis is how the project will help companies integrate all the data they need to make relevant discoveries using a BioGrid. A BioGrid is essentially a data and computational Grid created through a suite of tools developed by the project.
Here's how it works. One element of the software suite analyses over-expressing genes discovered during micro assays to establish what proteins become encoded. This uses standard techniques.A second analysis tool in the suite predicts what possible protein-protein interactions are taking place. This is novel. When a gene encodes a protein, the protein folds up into a unique shape, forming a 3D structure. This structure can only interact, or fit, with some proteins, but not others, like pieces of a jigsaw puzzle.
BioGrid's protein interaction software includes a database of the 20,000 known protein structures and uses that database to identify which ones could potentially interact, among the thousands of proteins created by the over-expressing genes. Once interesting potential protein interactions are known, BioGrid's ontology-based search technology can mine company or journal data for any relevant information.
Linking all these software tools together is a rules-based Java scripting language called Prova, also developed by the BioGrid team. It is the glue the sticks the Gene Expression, Protein Interaction and ontology-based literature analysis together into an integrated, cohesive unit. "It's an open source language, available at www.prova.ws, and about 20 groups are using it around the world right now. We made it open source because you need to develop a community to keep a programming language alive," says Dr Schroeder.