Microarrays offer a powerful means of studying human gene activity and thereby providing insight into cellular, molecular, and biological changes. Applied Biosystems has developed the Expression Array System, a complete system for gene expression studies. It includes Human Genome Survey Microarrays (utilizing fully curated transcript data for array design), the 1700 Chemiluminescent Microarray Analyzer, high-sensitivity chemistries optimized for chemiluminescence detection, and full system software including an Oracle Database that contains the most complete and current gene annotation. This integrated system not only offers a significant performance improvement over existing microarray technologies, but can be linked to the Celera Discovery System and TaqMan Gene Expression Assays for further in-depth gene expression analysis. The result is an integrated workflow for unambiguous determination of differential gene expression in various populations, tissues, stages of development, and diseases.
In this application note, we report on the profiling of ~28,000 human genes in paired normal and cancerous breast tissues using the Applied Biosystems Expression Array System. Differentially expressed genes in normal tissue, primary tumor, and metastasis tumor samples were identified by comparing breast tissue specimens from two patients. Additionally, microarray data was validated using quantitative realtime PCR (using TaqMan Gene Expression Assays).
Methods and Materials
This experiment was designed to study gene expression profiling for two patients with three different tissues (normal, primary tumor, and metastasis tumor) using two replicates for each sample (Figure 1). Total RNA samples (from Biochain Inc., Hayward, CA) obtained from two patients were processed into digoxigenin (DIG) -labeled cRNA using the Applied Biosystems Chemiluminescent RT-IVT Labeling Kit.
The labeled DIG-cRNA (10 μg per microarray) was injected into each microarray hybridization chamber. Following hybridization at 55C for 16 hours, the unbound material was washed from the microarrays. Features that retained bound DIG-labeled cRNA were visualized using the Applied Biosystems Chemiluminescence Detection Kit. The kit uses anti-DIG alkaline phoshatase that hydrolyses a chemiluminescent substrate to generate light at 458 nm which can be analyzed using the Applied Biosystems 1700 Chemiluminescent Microarray Analyzer.
Technical and Biological Replicates The reproducibility of the Applied Biosystems Expression Array System was demonstrated by comparison of technical replicates. Global median normalization was performed for each microarray and subsequent values were then log transformed. To assess variation in gene expression, background noise was filtered out for each pair of arrays by using only datawith signal-to-noise ratios greater than 3 (Figure 2A, Table 1). Additionally, we compared biological replicates (Biological_rep1 vs. Biological_rep2) between two individuals with the same disease condition (Figure 2B).
Fold Changes of Differential Gene Expression The fold change of normal vs. metastasis tumor sample was analyzed by filtering the dataset using p-values < 0.01 and a signal-to-noise ratio > 3 for use in ANOVA statistical analysis. Of these 2,508 genes, 623 genes were up-regulated and 435 genes were down-regulated. The number of detected genes for the corresponding p-values of < 0.05 and < 0.001 were 5,703 and 796 genes, respectively (Table 2). Using the Spotfire software application, fold changes can be determined for the differentially expressed genes (Figure 3).
From the ANOVA statistical analysis (p < 0.01) the identified 2,508 genes were selected for cluster analysis using the GeneSpring Software application for all 12 samples (Figure 4). Twodimensional clustering analysis was obtained from gene expression values based on the similarity of their foldchange patterns. The nomal and metastasis tumor tissue samples formed their own distinct clusters and were easily distinguishable with this analysis. For the primary tumor samples, the expression pattern of patient 1 is close to that of the normal tissue sample cluster, whereas the expression pattern of patient 2 is closer to that of the metastasis tumor sample cluster.
The gene annotation provided in the Oracle Database is an important component of the Expression Array System. This versatile tool empowers researchers to quickly find useful information on genes of interest and make direct data comparisons between platforms and laboratories using annotation such as gene name and symbol, RefSeq, NMs, LocusLink, dBEST, etc.
In addition the Oracle Database includes, when known, gene annotation that pertains to gene molecular function and biological processes. Two classification systems are employed; the Panther Classification system devised by Celera Genomics and the publicly available Gene Ontology (GO) annotations. In this study we used the Panther classifications to rapidly interpret gene expression data from the various normal and diseased tissue states.
For instance, among the 2,508 genes studied in this experiment, 200 genes differentially expressed between normal and cancer tissues with p-values < 0.01 were classified by the Panther system to be involved in various signal transduction pathways. This was considered notable since the signal transduction pathway has been implicated in breast cancer. Figure 4B shows the cluster analysis of the differential expression levels of these genes. From this analysis we can derive the identity of potential biomarkers for diagnostic or therapeutic purposes. For example, a number of genes that were not detectable in normal breast tissue samples but showed significant expression in cancerous tissues were identified (partial list shown in Table 3). These genes are potential markers that may be involved in tumor functions. Additionally, it was observed that a number of genes differentially expressed in this study have not been assigned information in the public database but are present in the Celera Discovery System database, meaning that the function of these genes have been less studied or relatively unknown. However, additional information about these genes is present in the Oracle Database. Table 4 shows a partial list of the genes that are not identified in the public database and are significantly differentially expressed with p-values of < 0.001 in this study. Also listed is the Panther families, classification, their associated molecular functions, and biological processes.
Validation with TaqMan Gene Expression Assays
To confirm the expression detected by microarrays, quantitative real-time PCR analysis was conducted for selected genes. These genes were identified (S/N > 3) in all 12 arrays and A. B. Figure 4. Clustering of microarray data. Two-dimensional hierarchical clustering analysis was performed using GeneSpring Software (Silicon Genetics, Redwood City, CA, USA). Red and blue represent high and low expression levels, respectively and yellow represents no, or unchanged, expression. In two-dimensional clustering analysis, arrays or samples (x axis) and genes (y axis) were clustered based on the similarity of their gene expression patterns. A. 2,508 genes generated from ANOVA test p < 0.01. B. Among the 2,508 genes, 200 genes are involved in signal transduction pathways as classified by the Panther gene classification system. < 0.05 5,703 < 0.01 2,508 < 0.001 796 P-value Cutoff Number of genes differently expressed Table 2. ANOVA Statistical Analysis: number of genes detected by the Applied Biosystems Expression Array System with the corresponding p-value as determined by the Spotfire software application (Sommerville, MA, USA). are differentially expressed (ANOVA p < 0.01). Some known breast cancer genes (for example, BRCA1, erB2, TF2R, etc.) along with a housekeeping gene GAPDH were included.
TaqMan Gene Expression Assays for these genes were ordered through the Applied Biosystems store at http://www.appliedbiosystems.com/ catalog using assay IDs provided in the Oracle Database.
Fold changes were compared for differential gene expression between tumor (primary or metastasis) samples and control (normal) samples, as measured by the Expression Array System and by quantitative real-time PCR using the Applied Biosystems 7900HT Sequence Detection System. Four replicates were performed for each TaqMan Gene Expression Assay and the resultant averaged cycle threshold (CT) was determined. Fold changes for the TaqMan assay were calculated using relative quantitation using GAPDH gene expression and the delta CT method (delta CT = delta CT tumor-delta CTT normal). Expression Array System fold change was determined by calculating the ratio of global normalized signals from primary or metastasis tumor samples to that of their normal tissue sample. The fold change data measured by both platforms correlated very well in both patients, indicating a high degree of concordance between the Applied Biosystems Expression Array System and the TaqMan assay reagents (Figure 5).
This study examined genome-wide expression profiling of paired cancerous and normal breast tissues using the Applied Biosystems Expression Array System. A comparison of normal and cancerous breast tissue identified statistically significant gene expression changes. This study also showed the potential of the Panther gene classification system for rapidly and reliably identifying biomarkers and exploring novel genes. TaqMan assays confirmed the gene expression changes detected by the microarray, which indicates a good concordance between the Applied Biosystems Expression Array System and the TaqMan process.
In about 50% of the cases the actual fold changes determined by both methods were approximately the same. However, in virtually all cases where a fold change (either up- or downregulated) was detected by the Applied Biosystems Expression Array System it was confirmed by TaqMan analysis.
Microarray data analysis includes both primary and secondary data analysis. The primary data analysis is performed using the software provided with the Applied Biosystems Expression Array System. For secondary analysis, the system software is compatible with both GeneSpring and Spotfire software application packages. The Expression Array System can also export microarray data in a flat-file format and industry standard MAGE-ML.
Differential Gene Expression
Generally, the following three steps are used to determine differential gene expression:
Step 1: Regression analysis on biological replicates (and/or technical replicates), which rejects array outliers based on regression threshold. For example, the data point that has a correlation coefficient of R2 < 0.75 or R2 < 0.95 is used as threshold.
Step 2: Perform t-test/ANOVA analysis to determine differentially expressed genes based on p-value threshold (e.g. p < 0.001, p < 0.01 or p < 0.05). This is dependent on the confidence level researchers would like to select between test samples and normal samples.
Step 3: Determine fold change of genes by measuring the average ratios between test and normal samples using variable fold change cutoffs and associated p-values.
In general, each array is normalized by the median chemiluminescent signal (i.e. assay normalization signal from output); the data set is filtered using a threshold of S/N > 3; and fold change is determined by the ratio of test sample(s) vs. normal sample(s) and p-values that associate with each fold change based on ANOVA test.