The study was conducted by Iuliana Ionita, a PhD student in computer science, Raoul-Sam Daruwala, a former research scientist from Courant Bioinformatics group and currently at Google, and Courant Professor Bud Mishra. Mishra is a professor of computer science and mathematics at the Courant Institute and also has an appointment in the Department of Cell Biology at NYU's School of Medicine.
Previous research has found that certain gene-chips--a technology that allows the genome-wide screening for mutations in genes or changes in gene expressions all at once--shed light on genes and mechanisms involved in the onset and spread of cancer. Specifically, chromosomal segments, when deleted in a single or both copies of genomes of a group of cancer patients, point to locations of tumor suppressor genes implicated in the cancer. The NYU study focused on automatic methods for reliable detection of such genes, their locations, and their boundaries. For this purpose, the NYU scientists sought to devise an efficient and novel statistical algorithm to map tumor suppressor genes using a multi-point statistical score function. Their algorithm is unique in that it exploits the high resolution of gene-chips and prior biological models through Bayesian statistics in order to optimally pinpoint the genes involved in the cancer, even when these genomes may have many other unrelated deletions, which happen as "collateral damage" to the genomes as the cancer progresses to an advanced stage.
The NYU algorithm estimates the location of tumor suppressor genes by analyzing segmental deletions in the genomes from cancer patients and the spatial relation of the deleted segments to any specific genomic interval. Since the gene-chip consists of many "probes"--each one characterizing an almost unique word and its location in the already-sequenced human genome--by combining these probe-measurements, one can estimate if an important genomic segment is missing. By analogy, this process is akin to guessing if a new edition of a book is missing an important paragraph by checking if some of the important key words in that paragraph are missing from the index of the new edition. The new algorithm computes a multipoint score for all intervals of consecutive probes, and the score reflects how well the deletion of that genomic interval may explain the cancer in these patients. In other words, the computed score measures how likely it is for a particular genomic interval to be a tumor suppressor gene implicated in the disease. In order to validate their algorithm, the authors produced a high fidelity in silico model of cancer, and checked how well they can detect the right genes, as they modified various parameters of the model in an adversarial manner. Encouraged by the success of their in silico study, they applied the algorithm to currently available patient data, and discovered that they were able to detect many genes that were already known in the literature, but also, several others that are statistically equally significant, but not found by the earlier studies.
The findings also showed that the algorithm may be applied to a wider class of problems--including the detection of oncogenes, which promote the growth of cancer when they are mutated or overexpressed. As the technology and the statistical algorithms of this nature keep improving in cost and accuracy, it will prove useful in finding good biomarkers, drug discovery, disease diagnosis, and choosing c orrect therapeutic intervention. The members of the NYU group (the authors, Dr. Salvatore Paxia and Dr. Thomas Anantharaman) are in the process of creating a simpler user interface for their software, providing interoperability across many different chip technologies, and finally, making it publicly available in order to facilitate its free and wide-spread usage.