Navigation Links
Bioinformatics


Bioinformatics or computational biology is the use of techniques from applied mathematics, informatics, statistics, and computer science to solve biological problems. Research in computational biology often overlaps with systems biology. Major research efforts in the field include sequence alignment, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions , and the modeling of evolution. The terms bioinformatics and computational biology are often used interchangeably, although the former is, strictly speaking, a subset of the latter. A common thread in projects in bioinformatics and computational biology is the use of mathematical tools to extract useful information from noisy data produced by high-throughput biological techniques. (The field of data mining overlaps with computational biology in this regard.) Representative problems in computational biology include the assembly of high-quality DNA sequences from fragmentary "shotgun" DNA sequencing, and the prediction of gene regulation with data from mRNA microarrays or mass spectrometry.

Making sense of the huge amounts of DNA data (pictured) produced by gene sequencing projects is just one of the tasks faced by bioinformatics.
Contents

Major research areas

Sequence analysis

Main articles: Sequence alignment, Sequence database

Since the Phage Φ-X174; was sequenced in 1977, the DNA sequence of more and more organisms is stored in electronic databases. These data are analyzed to determine genes that code for proteins, as well as regulatory sequences. A comparison of genes within a species or between different species can show similarities between protein functions, or relations between species (the use of molecular systematics to construct phylogenetic trees). With the growing amount of data, it becomes impossible to analyze DNA sequences manually. Today, computer programs are used to find similar sequences in the genome of dozens of organisms, within billions of nucleotides. These programs can compensate for mutations (exchanged, deleted or inserted bases) in the DNA sequence, in order to identify sequences that are related, but not identical. A variant of this sequence alignment is used in the sequencing process itself. The so-called shotgun sequencing (that was used, for example, by Celera Genomics to sequence the human genome) does not give a sequential list of nucleotides, but instead the sequences of thousands of small DNA fragments (each about 600 nucleotides long). The ends of these fragments overlap and, aligned in the right way, make up the complete genome. Shotgun sequencing yields sequence data quickly, but the task to re-align the fragments can be quite complicated for larger genomes. In the case of the Human Genome Project, it took several months on a supercomputer array to align them correctly. Shotgun sequencing is generally preferred for smaller genomes, such as bacteria, and often used at least partially on organisms with much larger genomes.

Another aspect of bioinformatics in sequence analysis is the automatic search for genes and regulatory sequences within a genome. Not all of the nucleotides within a genome are genes. Within the genome of higher organisms, large parts of the DNA do not serve any obvious purpose. This so-called junk DNA may, however, contain unrecognized functional elements. Bioinformatics helps to bridge the gap between genome and proteome projects, for example in the use of DNA sequence for protein identification.

See also: sequence analysis, sequence profiling tool, sequence motif.

Computational evolutionary biology

Evolutionary biology is the study of the origin and descent of species, as well as their change over time. Recent developments in genome sequencing and the ubiquity of fast computers enable researchers to trace evolution of species by tracing changes in their DNA. CEB research from the pre-genome era involved building computational models of populations and watching their behavior over time.

The field of genetic algorithms might be described as the rough inverse of CEB --- rather than investigating evolution through computer programs, it aims to improve computer programs through evolutionary principles.

Gene expression analysis

The expression of many genes can be determined by measuring mRNA levels with multiple techniques including microarrays, expressed cDNA sequence tag (EST) sequencing, serial analysis of gene expression (SAGE) tag sequencing, massively parallel signature sequencing (MPSS), or by measuring protein concentrations with high-throughput mass spectroscopy. All of these techniques are extremely noise-prone and/or subject to bias in the biological measurement, and a major research area in computational biology involves developing statistical tools to separate signal from noise in high-throughput gene expression (HT) studies. HT studies are often used to determine the genes implicated in a disorder: one might compare microarray data from cancerous epithelial cells to data from non-cancerous cells to determine the proteins that cancer up-regulates and down-regulates.

Expression data is also used to infer gene regulation: one might compare microarray data from a wide variety of states of an organism to form hypotheses about the genes involved in each state. In a single-cell organism, one might compare stages of the cell cycle, along with various stress conditions (heat shock, starvation, etc.). One can then apply clustering algorithms to that expression data to determine which genes are co-expressed. Further analysis could take a variety of directions: one 2004 study analyzed the promoter sequences of co-expressed (clustered together) genes to find common regulatory elements and used machine learning techniques to predict the promoters involved in regulating each cluster (see this study).

Protein expression analysis

Protein microarrays and high throughput (HT) mass spectrometry (MS) can provide a snapshot of the proteins present in a biological sample. Bioinformatics is very much involved in making sense of protein microarray and HT MS data; the former involves a number of the same problems involve in examining microarrays targeted at mRNA, the latter involves the bioinformatics problem of matching MS data against protein sequence databases.

Structure prediction

Main article: Protein structure prediction

Protein structure prediction is another important application of bioinformatics. The amino acid sequence of a protein, the so-called primary structure, can be easily determined from the sequence on the gene that codes for it. But, the protein can only function correctly if it is folded in a very special and individual way (if it has the correct secondary, tertiary and quaternary structure). The prediction of this folding just by looking at the amino acid sequence is quite difficult. Several methods for computer predictions of protein folding are currently (as of 2004) under development.

One of the key ideas in bioinformatics research is the notion of homology. In the genomic branch of bioinformatics, homology is used to predict the function of a gene: if the sequence of gene A, whose function is known, is homologous to the sequence of gene B, whose function is unknown, one could infer that B may share A's function. In the structural branch of bioinformatics homology is used to determine which parts of the protein are important in structure formation and interaction with other proteins. In a technique called homology modelling, this information is used to predict the structure of a protein once the structure of a homologous protein is known. This currently remains the only way to predict protein structures reliably.

One example of this is the similar protein homology between hemoglobin in humans and the hemoglobin in legumes (leghemoglobin). Both serve the same purpose of transporting oxygen in both organisms. Though both of these proteins have completely different amino acid sequences, their protein structures are virtually identical, which reflects their near identical purposes.

Other techniques for predicting protein structure include protein threading and de novo (from scratch) physics-based modeling.

See also structural motif.

Modeling biological systems

Main article: Systems biology

Systems biology involves the use of computer simulations of cellular subsystems (such as the networks of metabolites and enzymes which comprise metabolism, signal transduction pathways and gene regulatory networks) to both analyze and visualize the complex connections of these cellular processes. Artificial life or virtual evolution attempts to understand evolutionary processes via the computer simulation of simple (artificial) life forms.

Other applications

Morphometrics is used to analyze pictures of embryos to track and to predict the fate of cell clusters during morphogenesis.

Software tools

The computational biology tool best-known among biologists is probably BLAST, an algorithm for searching large sequence (protein, DNA) databases. NCBI provides a popular implementation that searches their massive sequence databases.

Computer scripting languages such as Perl and Python are often used to interface with biological databases and parse output from bioinformatics programs. Communities of bioinformatics programmers have set up free/open source projects such as EMBOSS , Bioconductor , BioPerl, BioPython, BioRuby, and BioJava which develop and distribute shared programming tools and objects (as program modules) that make bioinformatics easier.

See also

Related fields

Bibliography

  • R. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological sequence analysis. Cambridge University Press, 1998. ISBN 0521629713
  • Kohane, et al. Microarrays for an Integrative Genomics. The MIT Press, 2002. ISBN 026211271X
  • Mount, David W. "Bioinformatics: Sequence and Genome Analysis" Spring Harbor Press, May 2002. ISBN 0879696087
  • JM. Claverie, C. Notredame, Bioinformatics for Dummies. Wiley, 2003. ISBN 0764516965

External links


'"/>


See more about: Bioinformatics

TAG: Bioinformatics
Other biology definition
(Date:10/12/2008)...in weapon against HIV, but until recently no one k...shed online by the journal Nature reveals the at...APOBEC-3G and suggests new directions for drug de... cell. It is capable of stopping HIV at the first ...es its RNA into viral DNA. , The study,s authors...
(Date:10/10/2008)...ntified stem cells with the capacity to build fat,...f the journal Cell, a Cell Press publication. Al...new themselves, transplants of the progenitor cell...restore normal fat tissue in animals that are othe... into the causes of obesity, a condition character...
(Date:10/10/2008)...r 7, 2008 The world,s top scientists and clinicia...ladelphia to present their latest research finding...ety of Human Genetics (ASHG) from Tuesday, Novembe... Pennsylvania Convention Center., Founded in 1948...nization for human genetics,specialists worldwide...
(Date:10/10/2008)... on a previously unknown relationship between stem...chondria a cell,s energy makers. Stem cells with m...ty to differentiate and are more likely to form tu...JBC, could lead to methods of enriching the best s...d may provide some insights into the role of stem ...
Breaking Biology News(10 mins):Body's anti-HIV drug explained 2Fat-regenerating 'stem cells' found in mice 2The American Society of Human Genetics hosts 58th Annual Meeting in Philadelphia 2The American Society of Human Genetics hosts 58th Annual Meeting in Philadelphia 3The American Society of Human Genetics hosts 58th Annual Meeting in Philadelphia 4The American Society of Human Genetics hosts 58th Annual Meeting in Philadelphia 5Home Spray Cleaners Could Raise Asthma Risk 3587 1Home Spray Cleaners Could Raise Asthma Risk 3587 2Home Spray Cleaners Could Raise Asthma Risk 3587 3AISI and Roundys Supermarkets Inc Launch Program to Educate Shoppers on the Nutritional and Safety Benefits of Canned Food 3585 1AISI and Roundys Supermarkets Inc Launch Program to Educate Shoppers on the Nutritional and Safety Benefits of Canned Food 3585 2Omnicell Schedules Third Quarter Earnings Call 3583 1Omnicell Schedules Third Quarter Earnings Call 3583 2Proctor 26 Gamble Receives EPA Childrens Environmental Health Excellence Award 3582 1Proctor 26 Gamble Receives EPA Childrens Environmental Health Excellence Award 3582 2
...urnham Institute,s Gen-Sheng Feng has created a mo...tivity in the liver, and generated new findings ab...e useful in understanding the pathogenesis of type...ure Medicine in May, were made available to the sc...e journal,s website on April 10th. , The liver pla...
... large it literally covers the earth. They range i...h as 26 feet. Worldwide interest has begun to focu...c bacteria. "We study these nematodes -- which are... how diverse they are, but also to use them as bio..., a nematomologist in the University of Arizona Co...
...covered that two chemical compounds may help the i...sease without invasive gene therapy. Presented Mar...S, the new research demonstrates that the new chem... immune cells, ability to divide, enabling them to...mmune cells that fight HIV naturally produce telom...
... genomes grow into distinctive tissues, such as he...eved the differences among cell types arose from v... cells. Then, studies showed that adult neurons un...euronal genes in the rest of the body,s cells. , N...n that same repressive protein after all. In fact,...
Other Biology News:Mouse with designer liver has enhanced glucose tolerance, insulin response 2Mouse with designer liver has enhanced glucose tolerance, insulin response 3Stealth Worms May Improve Insect Pest Control 2Stealth Worms May Improve Insect Pest Control 3Stealth Worms May Improve Insect Pest Control 4Two chemicals boost immune cells' ability to fight HIV without gene therapy 2Gene keeps neural cells on correct developmental path 2Gene keeps neural cells on correct developmental path 3