Proteins that behaved in a similar way were nicely clustered by using K-means clustering. We could identify proteins that were differentially expressed between normal and malignant, normal and benign, and benign and malignant samples.
Discriminant analysis showed that a small subset of 13 proteins allowed us to discriminate with 100% accuracy between the known groups. The top-ranked proteins from this analysis are candidates for a noninvasive diagnostic test for cancer. The high-abundance proteins from this list are ideal candidates for such a test, because they offer a high probability of being detected as leakage proteins in plasma. Data sets could be created from the top-ranked proteins to build classifiers for the classification of the unknown samples.
If samples show a huge variation (either inherently biological or from sample preparation), proteins could be missing in one or more spot maps, which would affect the final classification. In such a case, one should use a less restrictive data set (i.e. more proteins) for discriminant analysis.
The most important question was whether and how well unknown samples could be classified using DeCyder EDA. We were able to demonstrate that all spot maps could be classified correctly, except for one. Nine unknown patients were classified (10 spot maps, one patient in duplicate). The incorrectly classified sample was one of the duplicates. This sample and its correctly classified duplicate were from poorly cast gels; they should be rerun.
A final comment must be made on the experimental design. For this type of study (classification of unknown samples), we highly recommend having a balanced design. In our study we had only four patients in the benign group and three in the normal group, but 11 patients in the malignant group (all patients run as duplicates). We should have similar numbers of samples in each of the different kno