- Research article
- Open Access
Dual activation of pathways regulated by steroid receptors and peptide growth factors in primary prostate cancer revealed by Factor Analysis of microarray data
BMC Genomics volume 6, Article number: 109 (2005)
We use an approach based on Factor Analysis to analyze datasets generated for transcriptional profiling. The method groups samples into biologically relevant categories, and enables the identification of genes and pathways most significantly associated to each phenotypic group, while allowing for the participation of a given gene in more than one cluster. Genes assigned to each cluster are used for the detection of pathways predominantly activated in that cluster by finding statistically significant associated GO terms. We tested the approach with a published dataset of microarray experiments in yeast. Upon validation with the yeast dataset, we applied the technique to a prostate cancer dataset.
Two major pathways are shown to be activated in organ-confined, non-metastatic prostate cancer: those regulated by the androgen receptor and by receptor tyrosine kinases. A number of gene markers (HER3, IQGAP2 and POR1) highlighted by the software and related to the later pathway have been validated experimentally a posteriori on independent samples.
Using a new microarray analysis tool followed by a posteriori experimental validation of the results, we have confirmed several putative markers of malignancy associated with peptide growth factor signalling in prostate cancer and revealed others, most notably ERRB3 (HER3). Our study suggest that, in primary prostate cancer, HER3, together or not with HER4, rather than in receptor complexes involving HER2, could play an important role in the biology of these tumors. These results provide new evidence for the role of receptor tyrosine kinases in the establishment and progression of prostate cancer.
The phenotype of a cell is determined by its transcriptional repertoire, a result of combinations of transcriptional programs partly set during lineage determination and partly activated in response to intrinsic and extrinsic stimuli. Microarray hybridization experiments permit a quantitative analysis of this transcriptional repertoire in response to defined experimental conditions. A particularly interesting case of study is given by the transcriptional repertoire of human tumors. Here, the objective is usually the search for cancer subtypes for individualized prognosis and/or therapy. The questions most frequently asked are whether samples can be automatically grouped, in the absence of additional information, into biologically relevant phenotypes; and whether transcriptional programs can be unveiled that can explain such phenotypes. It must be noted that this situation (sample clustering and relevant gene extraction) is difficult mainly due to three reasons : the sparsity of the data (samples), the high dimensionality of the feature (gene) space, and the fact that many features are irrelevant or redundant (low signal-to-noise ratio). It has been pointed out that, due to the low signal-to-noise ratio, the quality and reliability of clustering may degrade when using standard hierarchical clustering algorithms or similar approximations . Similarly, model-based clustering methods encounter problems due to the sparsity of the set and its high dimensionality, leading to overfitting during the density estimation process . Additional difficulties are encountered during the selection of features (genes) relevant to the sample cluster structure, since most clustering methods produce non-overlapping gene clusters. This behaviour may distort the extraction of biologically relevant genes in cases where expression patterns overlap several classes of samples or experimental conditions, a reflection of the dependence of the expression of most genes on multiple signals and their participation in more than one regulatory network.
Three main strategies have been taken in sample-based clustering: unsupervised gene selection, interrelated clustering and biclustering . The first views gene selection and sample clustering as basically independent processes, the second dynamically uses the relationship between gene and sample spaces to iteratively apply a clustering and selection engine, while the third tries to cluster both genes and samples at the same time in a reduced space. For the first one, principal components analysis (PCA) has been proposed. PCA, a well known dimensionality reduction technique, has been criticized because the sample projection in the low-dimensional space is not guaranteed to yield optimal sample partitions, particularly when the fraction of relevant genes specific to each cluster is small. As for the second approach, several novel methods have been proposed recently based on various greedy filtering techniques (for a review see ), but it has been suggested that they may group the data based on local decisions . Finally, different biclustering methods have also been applied to this situation [5–8], but a difficulty with most biclustering tools is that they generate non-overlapping partitions.
Here we apply Factor Analysis (FA) , a multivariate tool related to PCA, coupled to clustering algorithms in sample space, t-test scores in gene space and data mining procedures. Q-mode (i.e. in sample space) FA is a latent variable modelling tool  that assumes that the observed gene expression levels are the result of a linear combination of an unknown number of independent underlying global transcriptional programs, called latent variables or factors (Figure 1). The contribution of each factor to the expression levels of the genes in each sample is given by the elements of the loadings matrix (arrows in Figure 1). Each sample contains, in addition, a given amount of expression that cannot be modelled by the latent variables, for example due to the presence of noise. FA models the covariance of a data matrix, as opposed to PCA, which attempts to summarize the total variance. Covariance in the mRNA expression levels has been shown to occur in proteins involved in related pathways and functions, as well as in proteins co-locating to the same organuli in the cell, and may be indicative of common regulatory mechanisms at the expression level. By contrast, the specific variance in the expression of a given gene, not associated with the rest of the genes in the sample, is most likely related to artefacts in the chip or in data handling. We couple FA dimensionality reduction to clustering algorithms  to obtain clusters in sample space. For gene extraction, a multiple-testing corrected t-test (the so-called q-value) is employed. Finally, the genes assigned to each cluster are used for the detection of pathways predominantly activated in that cluster by finding statistically significant the GO  or GenMAPP terms associated to each cluster.
We first tested the approach by using a published dataset of microarray experiments in yeast , and then applied it to the analysis of human prostate cancer samples . The yeast dataset is particularly relevant because the biochemistry of S. cereviseae is relatively well understood in comparison with other eukaryots, and the data set has been previously analyzed with other clustering techniques. From the application to the prostate cancer dataset, a number of significant gene outcomes highlighted by the algorithm have been corroborated experimentally a posteriori by expression analysis on an independent set of samples. The biological interpretation of the results lead us to propose that two major pathways are predominantly activated in organ-confined, non-metastatic prostate cancer: those regulated by androgen receptor and by receptor tyrosine kinases. We close this paper by discussing the implications of these findings.
Results and Discussion
Testing FADA with the yeast expression dataset
Our procedure is coded in a software package, FADA. We first tested FADA by analyzing the dataset of Gasch et al , who studied the transcriptional responses of S. cerevisiae to a variety of stress stimuli. The main results are in Table 1, and the genes most significantly associated to the clusters are in Table 1 of the Supporting Information. We discuss in what follows only the most salient features of the analysis for this set, corresponding to clusters 1, 2, 6, 8 and 16, from a biological viewpoint.
Cluster 1 encompasses responses to heat shock, DTT (late), sorbitol (early response), stationary culture (late), and overexpression of Msn2p and Msn4p. The significant GO terms  automatically detected by FADA indicate that this grouping is related to a common environmental stress response (Table 1) (ESR or CER in the case of S. cerevisae, CESR in the case of S. pombe [13, 15, 16]). Inspection of the top selected genes (Table 1, supplementary material) confirms this assignment, as most of them are known to be transcriptionally regulated through the stress response element (STRE), recognized by Msn2p and Msn4p [17, 18]. Thus, Cluster 1 corresponds largely to a "core" ESR, induced by a variety of stimuli, including "early" time points of osmotic stress and "late" time points of DTT treatment and stationary culture. A relatively late induction of ESR by DTT has been noted previously, with suggestions that ESR could be a secondary response to the exposure of this reducing agent . Conversely, hyperosmotic shock is known to induce a rapid and strong expression of ESR [13, 15].
Clusters 2 and 16 correspond to responses to three oxidizing agents: hydrogen peroxide, which generates peroxides and hydroxyl radicals; menadione, a generator of superoxide; and diamide, a thiol reducing agent. FADA groups together responses to H2O2 and diamide (Cluster 2), while defining a distinct group for responses to menadione (Cluster 16). It is well known that several organisms use distinct sensing and response systems to discriminate among different degrees of oxidative injury. In S. pombe, reponses to low concentrations of H2O2 and to diamide depend on the b-Zip transcription factor pap1, while reponses to higher concentrations of H2O2 utilize a different transcription factor, atf1 . The S. cerevisiae homologue of S. pombe pap1 is Yap1p, a transcription factor regulated by oxidation, formation of intramolecular disulfide bonds at its carboxy terminus and nuclear translocation upon exposure to H2O2 and diamide [20–22]. On the other hand, of the b-ZIP proteins in S. cerevisiae, Yap3p is most similar to atf1 in its carboxy terminus, suggesting that both atf1 and Yap3p could be subject to a similar redox regulation. Interestingly, YAP1 is upregulated in Cluster 2 (H2O2 and diamide), but not in Cluster 8 (menadione), while Yap3 is upregulated in the latter Cluster (Table 1 of Supplementary Data). Moreover, several of the genes most relevant to Cluster 2 are known to respond to mild oxidative stress, and are controlled by Yap1p . The statistically significant GO-terms selected are related to "oxidoreductase" and "peptidase" activities. This includes genes regulating the thioredoxin and glutathione biosynthesis, genes for heat shock proteins, and a large number of genes involved in proteasome function and ubiquitin-dependent protein degradation (Table 1 of Supplementary Material).
Cluster 6 includes cultures at late times of nitrogen starvation. Many of the relevant genes in this group code for enzymes for the utilization and enhanced transport of poor nitrogen sources, such as allantoin or urea (Table 1 of Supplementary Material). Other upregulated genes include those required for different stages of meiosis (chromosome pairing, recombination and segregation; anaphase; or nucleokinesis), sporulation, autophagy, or genes that regulate vesicle and peroxisome structure and dynamics. Among these genes are also transcriptional regulators with major roles in the control of several of these processes, such as UGA3, DAL81 (allantoin metabolism), or IME1, RIM101 and SPO1 (meiosis and sporulation). This is consistent with the development of a classical response to nitrogen starvation in the absence of fermentable carbon sources, which leads to meiosis and sporulation [24–27]. FADA also suggests that this response to nitrogen starvation becomes most prominent at relatively late times, when it can be distinguished from the early, relatively non-specific response to nutrient deprivation [25, 26]. In fact, FADA finds "transcription" and "sporulation" as significant GO-terms (Table 1).
Cluster 8 aggregates samples from early stages of both early response to aminoacid and nitrogen starvation. FADA finds a significant overrepresentation of genes for amino acid biosynthetic pathways (Table 1), consistent with the fact that deprivation of nutrients, including nitrogen and carbon sources, is recognized by several sensing systems regulating rapamycin-sensitive TOR kinase . This lipid-dependent kinase derepresses translation of the GATA transcription factor Gcn4p [29, 30], which controls expression of many genes, including enzymes involved in amino acid biosynthesis . Thus, the selection of genes in Cluster 8 is consistent with known Gcn4p-dependent responses to nutrient and nitrogen starvation .
Altogether, these results indicate that the automatic analysis provided by FADA yields results consistent with the known biochemistry of yeast.
Application of FADA to the prostate cancer dataset
We next applied FADA to the dataset published by Welsh et al. , for the analysis of transcripts associated with prostate cancer. Samples were classified into two major branches: samples from cultured cells, and samples from tissues, which in turn could be further bifurcated into two well-supported branches, one corresponding to samples enriched for carcinomatous cells and one for non-neoplastic prostate cells (Figure 2). The first-level grouping into cultured vs. non-cultured samples most likely reflects the profound impact of culturing procedures on the transcriptional profiles of the different cell types. Within the cultured cells subgroup, samples were generally clustered according to cell type, with haematopoietic cell lines forming well-clustered groups and epithelial and fibroblastic prostate-derived cells clustering together with endothelial cells. A separate cluster was formed by the androgen-sensitive epithelial cell line LNCaP, the prostate cancer cell lines included in the study. The genes most significantly contributing to each sample cluster were analyzed for their participation in the pathways contained both in GenMAPP , and GO (Tables 2 and 3). Since pathway categorization is a difficult problem, as partition of the global interaction network in "parts" inevitably introduces artefacts, we also proceeded to a detailed, gene-by-gene inspection of the most discriminative genes based on inspection of literature data.
Cluster 1 corresponds to the LNCaP cluster. It is placed in a branch distinctly separated from the rest of the cultured prostatic cells. LNCaP cells were originally derived from metastatic prostate cells, presumably of epithelial origin  and respond to androgens through its cognate receptor . FADA found significant overrepresentation of upregulated genes coding for proteins that participate in electron transport and ATP generation, both when using GenMAPP and GO annotations (Figure 3, 4 and Table 2). Other sets of genes likely relevant to the LNCaP cluster, but not highlighted in the pathway mapping protocol, are those for proteins in steroid metabolism and signalling, such UDP glycosyltransferases B15 (Table 2 of the Supporting Information). Cluster 2 includes mesenchimal, epithelial, and endothelial cells. This cluster shows a bias for genes and pathways involved in ubiquitin and proteasome-dependent protein degradation, cell cycle regulation, inflammatory responses and cell-matrix interaction. Cluster 3 (hematopoietic cells) showed a significant bias in genes and pathways involved in cell cycle regulation and RNA processing. The selected genes included known markers of differentiation of B cell, T cell or myelomonocytic lineages. Examples are genes for immunoglobulin, histocompatibility antigens, haematopoietic-specific cytokines and their receptors, and regulatory proteins known to play significant roles in such lineages in processes such as signal transduction or cytoskeletal dynamics.
Regarding Cluster 4 (prostate tumor tissue), GenMAPP mapping finds significant overexpression of enzymes related to fatty acid metabolism (Table 2). Other genes and KEGG pathways with a significantly biased association with cluster 4 are those for ribosomal function and fatty acid synthesis (Table 2 of the Supporting Information). The upregulation of these two functions in prostate cancer has been noted previously [14, 35]. In addition, GO mapping finds overrepresentation of genes for proteins directly involved in steroid receptor recognition, including androgen receptor and estrogen receptor β. This is confirmed by a survey of the list of selected genes, where one can find a number of proteins involved in steroid signalling, including the coactivators GRIP1 and NRIP1, and genes that have been described as transcriptional targets of these pathways , such as the secreted proteases KLK2 and KLK3, and protein IQGAP, involved in cytoskeletal dynamics , or the enzymes fatty acid CoA-ligase or androgen-regulated short chain dehydrogenase (Table 2 of the Supporting Information). A second group of genes significantly contributing to this cluster are those for cell surface polypeptide growth factor receptors, associated signalling molecules and regulators, and known transcriptional targets for these pathways. These include the receptor tyrosine kinase partner ERBB3 (HER3), the calmodulin-dependent kinase activator CAMKK2, or the signalling modulators RAPGA1 and PDE3B (Table 2 of the Supporting Information).
Finally, the highest ranking genes for samples from normal prostate tissue (Cluster 5) correspond, according to GO, to proteins involved in the control of cytoskeletal architecture and dynamics in muscle cells (Table 2). GenMAPP finds a significant overrepresentation of muscle-associated functions. The implication is that, in these experiments, normal prostate tissue samples possibly are strongly enriched for muscle cells. This strong overrepresentation of genes corresponding to a smooth muscle phenotype suggests that the non-neoplastic tissues used correspond to areas of prostate hyperplasia or adenoma derived from the transition zone, in which smooth muscle cells are often major contributors . In practical terms, this suggests that these experiments may be used with caution in the comparison of tumor epithelial cells with corresponding normal epithelial counterparts.
In recent years, several transcriptional profiling studies have been performed in prostate cancer, aimed at the identification of novel tumor markers [14, 39–41] or prognostic signatures [42–44]. So far, only one study has systematically searched for overrepresented biochemical pathways in a meta-analysis of four previously published prostate cancer transcriptional profiling studies . This study used KEGG as reference pathway database, which is biased towards metabolic pathways . Our study, however, focuses on GenMapp and GO terms, and therefore on the identification of signalling pathways.
Signalling pathways in prostate cancer and their experimental validation
In order to validate the pathways found to be overrepresented in prostate tumor samples, we used real-time RT-PCR. We chose for our analysis the genes for hepsin, KLK3 (PSA), ERBB3 (HER3), IQGAP2, and POR/ARFAPTIN2. Hepsin was found to be overexpressed in most tumor samples, and validated by immunohistochemical analysis . This gene has been shown to be overexpressed in prostate cancer by several other groups. KLK3 (PSA) is the marker par excellence of prostate epithelial activity and cellular bulk, and detection of its serum protein levels is the best available marker for monitoring prostate cancer . HER3 is a receptor for the paracrine growth factor neregulin-1, and a transmembrane protein that tethers the ligand to its dimerization partners, the receptor tyrosine kinases HER2 and HER4 , and known to play important roles in the development and progression of the malignant phenotype in breast cancer . The abnormal expression and activity of HER2 has been studied extensively in the context of prostate cancer , being found overexpressed in advanced tumors, either metastatic or homone-independent, but infrequently in primary, organ-confined tumors. More controversial is the information available on the role of HER3, with reports of its overexpression in prostate cancer together with HER2, HER4, or both [51, 52], but also of its overexpression only in metastatic tumors, in particular of a truncated form corresponding to the extracellular domains of HER3 . Furthermore, several transcriptional profiling analyses have found overexpression of this gene in prostate cancer. IQGAP2 is a calmodulin-binding protein that participates in cell signalling and modulation of cytoskeletal dynamics , and its activity has been reported to be positively  and negatively associated with neoplastic phenotype. POR1/ARFAPTIN2, a Rac1-interacting protein , is a regulator of cytoskeletal dynamics that so far has not been associated with any particular type of neoplasia.
The results of semiquantitative real-time RT-PCR on our samples indicate that hepsin is significantly overexpressed in 14 out of 14 cases, IQGAP2 in 8 of 14, and HER3 in 10 of 14 cases (Figure 5). Other genes analyzed, such as KLK3 (PSA), HER2 or the steroid receptors androgen receptor, estrogen receptor α or estrogen receptor β are less frequently overexpressed in these tumors. Levels of desmin transcripts were determined as an index of the contribution of stromal cells, suggesting that the overexpression of the analyzed genes are detected in tumor samples even in the presence of substantial stromal contamination (Figure 5). Of particular interest is the observed upregulation of HER3 in prostate tumor tissues relative to normal tissues. The HER3/ErbB3 protein has impaired intrinsic kinase activity , and it appears to function in signal transduction by tethering the ligand to other members of the HER family of receptors, with preference for HER2/ErbB2 . Increased levels of expression of HER3 are seen in many tumors that express HER2 , and it is widely assumed that the signalling and/or oncogenic functions reside in the corresponding heterodimer, rather than in either individual receptor [59, 60]. Recent experimental evidence further highlights the importance of HER3 in conferring a malignant phenotype and a hormone-refractory state to prostate epithelial cells . Thus, whenever HER3 is expressed it is reasonable to expect co-expression of at least one other member of the HER family. Therefore, we determined by real-time RT-PCR the relative expression in our prostate tissue samples of the genes for all four members of the HER family of receptor tyrosine kinases. Our results show that HER4 is expressed at increased levels in 10 of 14 prostate tumor samples (Fig. 5A, B), whereas HER2/ErbB2 and EGFR are overexpressed in 3 of the 14 samples analyzed. Seven samples simultaneously overexpressed HER3 and HER4, of which 2 overexpressed all four members of the HER family (Fig. 5A, B). None of the samples overexpressed the pairs HER3 and HER2, or HER3 and EGFR, without overexpressing at the same time one of the other members of the family (Fig. 5A, B).
As mentioned in the Methods section, both tumor and normal tissues were carefully chosen to have similar representation of epithelial compartment. However, to further ensure that the observed expression of HER3 was not due to a dilution effect of normal epithelial cells by stroma, we performed real-time PCR analysis of laser microdissected samples. For this, we selected four samples that had shown overexpression of HER3 in the enriched tumor samples described above, and two that had levels that did not differ significantly from non-tumor containing (normal) matched tissues. Of the four samples in which the enriched tumor tissue had shown increased levels of HER3 transcript, three microdissected samples overexpressed HER3 (Fig. 5C). In two of the microdissected samples, HER3 transcript levels were equal in normal and tumor microdissected epithelia, and this also corresponded to samples in which HER3 levels did not differ significantly between enriched tumor and normal prostate tissues (Fig. 5C). This analysis showed that overexpression of HER3 in prostate tumor tissues is not due to simple enrichment of epithelial cells in comparison with non-tumor tissues. To further confirm the cell type expressing HER3 in prostate tissues, immunohistochemical analysis with a monoclonal antibody to HER3 was performed on 16 prostate samples, arranged in duplicate 1-mm diameter cores in tissue microarrays, in which both tumor and normal glands were present. HER3 protein was found clearly overexpressed in tumor epithelia in 13 of the 16 cases (81.2%), showing juxtamembrane and finely granular cytoplasmic patterns (Fig. 5D). In all cases, normal epithelia showed weak reactivities for HER3 (Fig. 5D).
In summary, our transcriptome re-analysis, validated by real-time RT-PCR of non-microdissected and microdissected samples and by immunohistochemical analysis, significantly reinforces previous immunohistochemical studies that reported high levels of expression of HER3 and HER4 in primary prostate cancer [51, 52].
We have shown that the method presented here for the analysis of expression microarray data permits the classification of samples into meaningful categories and, simultaneously, to identify a subset of genes and their assignment to pathways most significantly contributing to the corresponding phenotypes, while allowing for a given gene to participate as significant in more than one cluster of samples. The analysis of the yeast dataset validates the approach. Our results are consistent with biochemical pathways known to be activated in the different stress conditions analyzed, and the clustering of samples reflects the underlying similarity of the biochemical responses. In the application to the prostate cancer dataset, we have found that two pathways, one modulated by androgen receptor and a second one by signals that originate from cell surface growth factor receptors, are prominently active in the organ-confined, non-metastatic prostate cancer samples analyzed. The latter pathway has been reported to be spuriously active in at least a subset of prostate tumors that have progressed to invasive and hormone-independent states . Our results suggest that such altered activation may already be present in primary tumors. Although a prevailing model for prostate tumor progression is that acquisition of the capacity for metastatic and hormone independent growth proceeds through selection of rare populations of cells concealed among primary tumor cells, there is also evidence that a transcriptional program for metastasis may already be present in the bulk of primary tumors at the time of diagnosis [63, 64]. Our analysis would be more consistent with the latter model.
Finally, we have unveiled and validated several markers highlighted by the analysis of the prostate cancer dataset. While several of these genes were identified in the original analysis of the data , others are revealed here, notably HER3, IQGAP2 and POR1, the biologically most relevant being HER3. With an external dataset, we have found that prostate cancer samples frequently co-overexpress HER3 and HER4, accompanied less frequently by increased expression of EGFR or HER2. Overexpression of HER2 and consequent increased signalling have been associated with advanced prostate cancer, development of hormone independent state and poor prognosis [65, 66], but is infrequently observed in primary tumors [67, 68]. On the other hand, our results suggest that, in primary prostate cancer, HER3, together or not with HER4, rather than receptor complexes involving HER2, could play important roles in the biology of these tumors.
Materials and methods
The S. cereviseae dataset consists of transcriptional responses of the yeast S. cerevisiae to environmental stress . It originally consists of spotted array measurements of 6152 genes in 173 experimental conditions that include temperature shocks, hyper and hypoosmotic shocks, exposure to various agents such as peroxide, menadione, diamide, dithiothreitol, amino acid starvation, nitrogen source depletion and progression into stationary phase. Log-ratios were preprocessed following several steps: first data from genes with missing values were filtered out, and their missing values estimated with LSimpute  using the 'Adaptive' method. Next, ratios were computed from the log-ratios and quantile-normalized (experiment-wise) using the normalizeQuantile function from the R package , so that all experiments had the same average sample distribution. Finally, ratios were log transformed again.
The prostate cancer dataset chosen is described in . It was originally obtained by hybridizations on Affymetrix U95A oligonuleotide arrays with probes for a total of 55 samples. Intensity values were preprocessed following several steps: first intensity data were thresholded, with intensities below 10 fixed at 10 and values above 16000 fixed at 16000. The thresholded values were log-transformed and then centered by the median of all experiments. Finally, genes were subjected to z-transformation (per gene basis).
Determination of genotypically coherent groups of samples
Q-mode Factor Analysis (FA)  seeks to find an underlying orthogonal factor model of an original X-matrix nxm (where n are the number of samples and m the number of mRNA levels measured) of the form:
X = LF + E
L is the loadings matrix of size nxk, where k is the number of factors, and F the scores matrix of size kxm, while E is the residual matrix, which contains both the specific variance of the individual genes and the errors in the model (see Figure 1). We used the so-called principal factor solution to solve this factor model. Specifically, in a first step, and based on the correlation matrix R derived from X, communalities (i.e. the proportion of the variance explained by common factors) were computed from the multiple squared correlation coefficient between the ith variable and the rest. These communalities replaced the diagonal entries of the correlation matrix, which was subjected to diagonalization. New communalities were computed from the loadings at the chosen dimensionality, obtained by scaling the eigenvector matrix (P), as follows:
L = P Λ 1/2
The new communalities again replaced the diagonal entries, and the process was iterated until convergence. Finally, we proceeded to rotate the factor loadings by means of a varimax rotation . The effect of this rotation is to maximally align each of the samples with one factor in order to simplify the factor model and make it more readily interpretable. Phyletic trees were derived by clustering samples in loadings space at the optimal dimensionality using average linkage [4, 11]. When needed, bootstrap values were computed by selecting random subsets of 90% of the genes . Distribution of trees and frequency of each branch in the original tree were recorded using CONSENSE, program included in the PHYLIP package .
Selecting genes associated to each cluster
Once sample clusters are defined, these are used to identify groups of genes contributing heavily to the specific character of different groups. Each gene on the list is subjected to a Student's t-test that measures the differential expression of the gene in the cluster as compared with the rest of the samples. t-test scores were transformed to q-values, which include multiple testing correction. The q-value is similar to the well known P-value, except that it is a measure of significance in terms of the false discovery rate, rather than the false positive rate . Genes with a q-value < 10-4 were taken as differentially expressed for that particular cluster.
Assigning pathways to gene clusters
The association between selected genes and biological functions was established by determining the hypergeometric distribution of genes on the annotation databases GO  or GenMapp . With this distribution we computed the probability that at least x genes annotated within a given biological function according to GO (or GenMapp) in a cluster of size n (the total number of genes per cluster selected in the previous step) can be obtained by chance, given a population of N genes under consideration and given A, the total number of genes within N with that particular annotation. These P-values are obtained according to:
An aggregated score for each cluster from the significant P-values (i.e., those below 10-2) is computed as follows:
s0 = ∑-ln p(x; N, A, n)
The significance of this score is established by simulation. We randomly selected 100 samples of size n genes each (the number of genes per cluster selected according to the q-value) and computed a new s-score (s r ) for each one. The Z-score is finally computed as:
z = (s o - <s r >)/σ r
Z-scores > 2.0 are taken as indicative of significant association between the samples in the cluster and the set of pathways uncovered.
We should emphasize that in spite of the apparent intricacy of the computational procedure, the computational complexity is similar to other biclustering methods, and operates within a highly constrained parameter space: in the factor analysis part of the program only the percentage of variance employed should be set, yielding a reduced number of dimensions or latent variables, usually below 5; the number of clusters is automatically determined in this space from the c-index, and has no free parameters, and the selection of genes relevant for each cluster only depends on the cutoff employed in the q-value.
We used RT-PCR with either TaqMan probes or by SYBRGreen incorporation to determine the expression levels of selected genes on samples unrelated to the original study by Welsh et al. In each instance, the tumor sample and its matching normal counterpart were obtained from the same case, upon removal by radical prostatectomy. Serial sections from all normal counterparts to the tumor tissues were stained and analyzed to confirm that normal prostate glands and epithelial cells were present in near-normal patterns, and that they contained less than 1% of cells or structures with carcinomatous appearance. In addition, samples were chosen such that the tumor and normal counterparts in each case had approximately equal representations of the epithelial compartment, as assessed microscopically. RNA was isolated from corresponding frozen serial sections, and controlled for quality on a 2100 BioAnalyzer instrument (Agilent, Palo Alto, CA). For each sample, 0.5 μg of total RNA was reverse transcribed by priming with random hexamers at 42°C for 50 minutes, followed by treatment with RNase at 37°C for 20 min. The resulting cDNAs were used as templates in PCR reactions with gene-specific primers. Real-time PCR was performed on ABI PRISM 7700 (Applied Biosystems, Foster City, CA) or DNA Engine Opticon (MJ Research, Waltham, MA) instruments. TaqMan probes and their corresponding primer sets were obtained from Applied Biosystems. Thermal cycler conditions were 95°C for 10 min and 40 cycles of 95°C for 15 sec and 60°C for 1 minute for TaqMan assays. In the case of SYBRGreen reactions, the conditions were 95°C for 15 min, and 40 cycles of 95°C for 15 sec, 55°C for 30 sec and 72°C for 30 sec. All determinations were performed in triplicate and in at least two independent experiments. Since the relative amplification efficiencies of target and reference samples were found to be approximately equal, the ΔΔCt method was applied to estimate relative transcript levels. Levels of ribosomal S14r amplification were used as an endogenous reference to normalize each sample value of Ct (threshold cycle) and normal tissues were used as calibrators for their tumoral counterparts in each case. The final results, expressed as n-fold differences in target gene expression were calculated as follows:
nTARGET = 2-[(Ct target - Ct reference)TUMORAL - (Ct target - Ct reference)NORMAL]
Laser capture microdissection
Prostate tissues were obtained by punch sections of radical prostatectomies and snap-frozen in isopentane at -50°C embedded in OCT-containing cryomolds. 8 μM cryosections were mounted onto plastic membrane-covered glass slides (PALM Mikrolaser Technology, Bernried, Germany), fixed for 3 minutes in 70% ethanol, stained with Mayer's hematoxilin, dehydrated, air-dried for 10 minutes and stored at -80°C until used. Laser catapulting microdissection was performed with a PALM MicroBeam Systems instrument. 2 to 5 × 104 normal or carcinomatous epithelial cells were collected and estimated to be >99% homogeneous by microscopic visualization.
Total RNA from microdissected samples was isolated using the PicoPure RNA Kit (Arcturus Engineering, Santa Clara, CA), with an additional DNase I digestion step (Qiagen, Valencia, CA).
Sixteen paraffin embedded prostate samples were evaluated for HER3 expression by immunohistochemistry on a tissue microarray. The cases were represented in duplicated 1-mm diameter cores and always included normal prostatic glands adjacent to neoplastic foci in at least one of the cores. Three μM sections of the microarray were deparaffinized, rehydrated and subjected to antigen retrieval in a pressure cooker with citrate buffer at pH 6.0 for 5 min. Slides were cooled for 15 min, washed in water and incubated overnight at 4°C with anti-HER3 mouse monoclonal antibody (Upstate Biotechnology, Lake Placid, New York). Endogenous peroxidase was quenched and slides were incubated for 30 minutes with secondary antibody (Envision, DAKO, Gostrup, Denmark). Reactions were detected after development with diaminobencidine and H2O2 for 3 min. Slides were counterstained with Harri's hematoxilin, dehydrated and mounted. As a negative control, the primary antibody was substituted for isotype-matched mouse IgG.
Access to the program
The complete procedure has been coded in a Fortran-77 program, called FADA. Remote access to the program has been enabled by setting up a web-server where the program can be executed .
Jiang D, Tang C, Zhang A: Cluster Analysis for Gene Expression Data: A Survey. IEEE Transactions on Knowledge and Data Engineering. 2004, 16: 1370-1386. 10.1109/TKDE.2004.68.
Xing EP, Karp RM: CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. Bioinformatics. 2001, 17 Suppl 1: S306-15.
Yoshida R, Higuchi T, Imoto S: A Mixed Factors Model for Dimension Reduction and Extraction of a Group Structure in Gene Expression Data. 2004
Johnson RA, Wichern DW: Applied Multivariate Statistical Analysis. 1992, Upper Saddle River, NJ, Prentice Hall
Sheng Q, Moreau Y, De Moor B: Biclustering microarray data by Gibbs sampling. Bioinformatics. 2003, 19 Suppl 2: II196-II205.
Kluger Y, Basri R, Chang JT, Gerstein M: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 2003, 13: 703-716. 10.1101/gr.648603.
Tanay A, Sharan R, Shamir R: Discovering statistically significant biclusters in gene expression data. Bioinformatics. 2002, 18 Suppl 1: S136-44.
Cheng Y, Church GM: Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol. 2000, 8: 93-103.
Reyment RJ, Joreskog KG: Applied Factor Analysis in the Natural Sciences. 1996, Cambridge, Cambridge University Press
Bar-Joseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, Robert F, Gordon DB, Fraenkel E, Jaakkola TS, Young RA, Gifford DK: Computational discovery of gene modules and regulatory networks. Nat Biotechnol. 2003, 21: 1337-1342. 10.1038/nbt890.
Hartigan JA: Clustering algorithms. 1975, New York, NY, Wiley & Sons
Consortium GO: The Gene Ontology (GO) database and informatics resource. Nucl Acids Res. 2004, 32: D258-D261. 10.1093/nar/gkh036.
Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO: Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000, 11: 4241-4257.
Welsh JB, Sapinoso LM, Sn AI, Kern SG, Wang-Rodriguez J, Moskaluk CA, Frierson HFJ, Hampton GM: Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res. 2001, 61: 5974-5978.
Causton HC, Ren B, Koh SS, Harbison CT, Kanin E, Jennings EG, Lee TI, True HL, Lander ES, Young RA: Remodeling of yeast genome expression in response to environmental changes. Mol Biol Cell. 2001, 12: 323-337.
Chen D, Toone WM, Mata J, Lyne R, Burns G, Kivinen K, Brazma A, Jones N, Bahler J: Global transcriptional responses of fission yeast to environmental stress. Mol Biol Cell. 2003, 14: 214-229. 10.1091/mbc.E02-08-0499.
Moskvina E, Schuller C, Maurer CT, Mager WH, Ruis H: A search in the genome of Saccharomyces cerevisiae for genes regulated via stress response elements. Yeast. 1998, 14: 1041-1050. 10.1002/(SICI)1097-0061(199808)14:11<1041::AID-YEA296>3.0.CO;2-4.
Treger JM, Schmitt AP, Simon JR, McEntee K: Transcriptional factor mutations reveal regulatory complexities of heat shock and newly identified stress genes in Saccharomyces cerevisiae. J Biol Chem. 1998, 273: 26875-26879. 10.1074/jbc.273.41.26875.
Quinn J, Findlay VJ, Dawson K, Millar JB, Jones N, Morgan BA, Toone WM: Distinct regulatory proteins control the graded transcriptional response to increasing H(2)O(2) levels in fission yeast Schizosaccharomyces pombe. Mol Biol Cell. 2002, 13: 805-816. 10.1091/mbc.01-06-0288.
Kuge S, Jones N, Nomoto A: Regulation of yAP-1 nuclear localization in response to oxidative stress. Embo J. 1997, 16: 1710-1720. 10.1093/emboj/16.7.1710.
Delaunay A, Pflieger D, Barrault MB, Vinh J, Toledano MB: A thiol peroxidase is an H2O2 receptor and redox-transducer in gene activation. Cell. 2002, 111: 471-481. 10.1016/S0092-8674(02)01048-6.
Delaunay A, Isnard AD, Toledano MB: H2O2 sensing through oxidation of the Yap1 transcription factor. Embo J. 2000, 19: 5157-5166. 10.1093/emboj/19.19.5157.
Dumond H, Danielou N, Pinto M, Bolotin-Fukuhara M: A large-scale study of Yap1p-dependent genes in normal aerobic and H2O2-stress conditions: the role of Yap1p in cell proliferation control in yeast. Mol Microbiol. 2000, 36: 830-845. 10.1046/j.1365-2958.2000.01845.x.
Herskowitz I: Life cycle of the budding yeast Saccharomyces cerevisiae. Microbiol Rev. 1988, 52: 536-553.
Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown PO, Herskowitz I: The transcriptional program of sporulation in budding yeast. Science. 1998, 282: 699-705. 10.1126/science.282.5389.699.
Primig M, Williams RM, Winzeler EA, Tevzadze GG, Conway AR, Hwang SY, Davis RW, Esposito RE: The core meiotic transcriptome in budding yeasts. Nat Genet. 2000, 26: 415-423. 10.1038/82539.
Mata J, Lyne R, Burns G, Bahler J: The transcriptional program of meiosis and sporulation in fission yeast. Nat Genet. 2002, 32: 143-147. 10.1038/ng951.
Thomas G, Hall MN: TOR signalling and control of cell growth. Curr Opin Cell Biol. 1997, 9: 782-787. 10.1016/S0955-0674(97)80078-6.
Cherkasova VA, Hinnebusch AG: Translational control by TOR and TAP42 through dephosphorylation of eIF2alpha kinase GCN2. Genes Dev. 2003, 17: 859-872. 10.1101/gad.1069003.
Kubota H, Obata T, Ota K, Sasaki T, Ito T: Rapamycin-induced translational derepression of GCN4 mRNA involves a novel mechanism for activation of the eIF2 alpha kinase GCN2. J Biol Chem. 2003, 278: 20457-20460. 10.1074/jbc.C300133200.
Natarajan K, Meyer MR, Jackson BM, Slade D, Roberts C, Hinnebusch AG, Marton MJ: Transcriptional profiling shows that Gcn4p is a master regulator of gene expression during amino acid starvation in yeast. Mol Cell Biol. 2001, 21: 4347-4368. 10.1128/MCB.21.13.4347-4368.2001.
Dahlquist KD, Salomanis N, Vranizan K, Lawlor SC, Conkin BR: GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nature Gen. 2002, 31: 19-20. 10.1038/ng0502-19.
Horoszewicz JS, Leong SS, Chu TM, Wajsman ZL, Friedman M, Papsidero L, Kim U, Chai LS, Kakati S, Arya SK, Sandberg AA: The LNCaP cell line - a new model for studies on human prostatic carcinoma. Prog Clin Biol Res. 1980, 37: 115-132.
Horoszewicz JS, Leong SS, Kawinski E, Karr JP, Rosenthal H, Chu TM, Mirand EA, Murphy GP: LNCaP model of human prostatic carcinoma. Cancer Res. 1983, 43: 1809-1818.
Baron A, Migita T, Tang D, Loda M: Fatty acid synthase: a metabolic oncogene in prostate cancer?. J Cell Biochem. 2004, 91: 47-53. 10.1002/jcb.10708.
DePrimo SE, Diehn M, Nelson JB, Reiter RE, Matese J, Fero M, Tibrishani R, Brown PO, Brooks JD: Transcriptional programs activated by exposure of human prostate cancer cells to androgen. Genome Biol. 2002, 3: research 0032.1-32.12. 10.1186/gb-2002-3-7-research0032.
Briggs MW, Sacks DB: IQGAP proteins are integral components of cytoskeletal regulation. EMBO Rep. 2003, 4: 571-574. 10.1038/sj.embor.embor867.
Foster CS: Pathology of benign prostatic hyperplasia. Prostate Suppl. 2000, 9: 4-14. 10.1002/1097-0045(2000)45:9+<4::AID-PROS3>3.0.CO;2-Q.
Magee JA, Araki T, Patil S, Ehrig T, True L, Humphrey PA, Catalona WJ, Watson MA, Milbrandt J: Expression profiling reveals hepsin overexpression in prostate cancer. Cancer Res. 2001, 61: 5692-5696.
Luo J, Duggan DJ, Chen Y, Sauvageot J, Ewing CM, Bittner ML, Trent JM, Isaacs WB: Human prostate cancer and benign prostatic hyperplasia: molecular dissection by gene expression profiling. Cancer Res. 2001, 61: 4683-4688.
Ernst T, Hergenhahn M, Kenzelmann M, Cohen CD, Bonrouhi M, Weninger A, Klaren R, Grone EF, Wiesel M, Gudemann C, Kuster J, Schott W, Staehler G, Kretzler M, Hollstein M, Grone HJ: Decrease and gain of gene expression are equally discriminatory markers for prostate carcinoma: a gene expression analysis on total and microdissected prostate tissue. Am J Pathol. 2002, 160: 2169-2180.
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D'Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2002, 1: 203-209. 10.1016/S1535-6108(02)00030-2.
LaTulippe E, Satagopan J, Smith A, Scher H, Scardino P, Reuter V, Gerald WL: Comprehensive gene expression analysis of prostate cancer reveals distinct transcriptional programs associated with metastatic disease. Cancer Res. 2002, 62: 4499-4506.
Stuart RO, Wachsman W, Berry CC, Wang-Rodriguez J, Wasserman L, Klacansky I, Masys D, Argen K, Goodison S, McCelland M, Wang Y, Sawyers A, Kalcheva I, Tarin D, Mercola D: In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proc Natl Acad Sci USA. 2004, 101: 615-620. 10.1073/pnas.2536479100.
Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM: Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res. 2002, 62: 4427-4433.
Kanehisa MGSKSOYHM, Goto S, Kawashima S, Okuno Y, Hattori M: The KEGG resource for dephiphering the genome. Nucl Acids Res. 2004, 32: D277-D280. 10.1093/nar/gkh063.
Brawer MK: Prostate-specific antigen. Semin Surg Oncol. 2000, 18: 3-9. 10.1002/(SICI)1098-2388(200001/02)18:1<3::AID-SSU2>3.0.CO;2-I.
Schlessinger J: Ligand-induced, receptor-mediated dimerization and activation of EGF receptor. Cell. 2002, 110: 669-672. 10.1016/S0092-8674(02)00966-2.
Menard S, Tagliabue E, Campiglio M, Pupa SM: Role of HER2 gene overexpression in breast carcinoma. J Cell Physiol. 2000, 182: 150-162. 10.1002/(SICI)1097-4652(200002)182:2<150::AID-JCP3>3.0.CO;2-E.
Agus DB, Akita RW, Fox WD, Lofgren JA, Higgins B, Maiese K, Scher HI, Sliwkowski MX: A potential role for activated HER-2 in prostate cancer. Semin Oncol. 2000, 27: 76-83; discussion 92-100.
Myers RB, Srivastava S, Oelschlager DK, Grizzle WE: Expression of p160erbB-3 and p185erbB-2 in prostatic intraepithelial neoplasia and prostatic adenocarcinoma. J Natl Cancer Inst. 1994, 86: 1140-1145.
Lyne JC, Melhem MF, Finley GG, Wen D, Liu N, Deng DH, Salup R: Tissue expression of neu differentiation factor/heregulin and its receptor complex in prostate cancer and its biologic effects on prostate cancer cells in vitro. Cancer J Sci Am. 1997, 3: 21-30.
Vakar-Lopez F, Cheng CJ, Kim J, Shi GG, Troncoso P, Tu SM, Yu-Lee LY, Lin SH: Up-regulation of MDA-BF-1, a secreted isoform of ErbB3, in metastatic prostate cancer cells and activated osteoblasts in bone marrow. J Pathol. 2004, 203: 688-695. 10.1002/path.1568.
Li S, Wang Q, Chakladar A, Bronson RT, Bernards A: Gastric hyperplasia in mice lacking the putative Cdc42 effector IQGAP1. Mol Cell Biol. 2000, 20: 697-701. 10.1128/MCB.20.2.697-701.2000.
Van Aelst L, Joneson T, Bar-Sagi D: Identification of a novel Rac1-interacting protein involved in membrane ruffling. Embo J. 1996, 15: 3778-3786.
Guy PM, Platko JV, Cantley LC, Cerione RA, Carraway KL: Insect cell-expressed p180erbB3 possesses an impaired tyrosine kinase activity. Proc Natl Acad Sci U S A. 1994, 91: 8132-8136.
Daly JM, Jannot CB, Beerli RR, Graus-Porta D, Maurer FG, Hynes NE: Neu differentiation factor induces ErbB2 down-regulation and apoptosis of ErbB2-overexpressing breast tumor cells. Cancer Res. 1997, 57: 3804-3811.
Naidu R, Yadav M, Nair S, Kutty MK: Expression of c-erbB3 protein in primary breast carcinomas. Br J Cancer. 1998, 78: 1385-1390.
Holbro T, Beerli RR, Maurer F, Koziczak M, Barbas CF, Hynes NE: The ErbB2/ErbB3 heterodimer functions as an oncogenic unit: ErbB2 requires ErbB3 to drive breast tumor cell proliferation. Proc Natl Acad Sci U S A. 2003, 100: 8933-8938. 10.1073/pnas.1537685100.
Yarden Y, Slikowski MX: Untangling the ErbB signalling network. Nature Rev Mol Cell Biol. 2001, 2: 127-137. 10.1038/35052073.
Mellinghoff IK, Vivanco I, Kwon A, Tran C, Wongvipat J, Sawyers CL: HER2/neu kinase-dependent modulation of androgen receptor function through effects on DNA binding and stability. Cancer Cell. 2004, 6: 517-527. 10.1016/j.ccr.2004.09.031.
Feldman BJ, Feldman D: The development of androgen-independent prostate cancer. Nat Rev Cancer. 2001, 1: 34-45. 10.1038/35094009.
Ramaswamy S, Ross KN, Lander ES, Golub TR: A molecular signature of metastasis in primary solid tumors. Nat Genet. 2003, 33: 49-54. 10.1038/ng1060.
Wigelt B, Glas AM, Wessels LFA, Witteveen AT, Peterse JL, van't Veer LJ: Gene expression profiles of primary breast tumors maintained in distant metastases. Proc Natl Acad Sci USA. 2003, 100: 15901-15905. 10.1073/pnas.2634067100.
Signoretti S, Montironi R, Manola J, Altimari A, Tam C, Bubley G, Balk S, Thomas G, Kaplan I, Hlatky L, Hahnfeldt P, Kantoff P, Loda M: Her-2-neu expression and progression toward androgen independence in human prostate cancer. J Natl Cancer Inst. 2000, 92: 1918-1925. 10.1093/jnci/92.23.1918.
Osman I, Scher HI, Drobnjak M, Verbel D, Morris M, Agus D, Ross JS, Cordon-Cardo C: HER-2/neu (p185neu) protein expression in the natural or treated history of prostate cancer. Clin Cancer Res. 2001, 7: 2643-2647.
Savinainen KJ, Saramaki OR, Linja MJ, Bratt O, Tammela TL, Isola JJ, Visakorpi T: Expression and gene copy number analysis of ERBB2 oncogene in prostate cancer. Am J Pathol. 2002, 160: 339-345.
Lara PNJ, Meyers FJ, Gray CR, Edwards RG, Gumerlock PH, Kauderer C, Tichauer G, Twardowski P, Doroshow JH, Gandara DR: HER-2/neu is overexpressed infrequently in patients with prostate carcinoma. Results from the California Cancer Consortium Screening Trial. Cancer. 2002, 94: 2584-2589. 10.1002/cncr.10526.
Bo TH, Dysvik B, Jonassen I: LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res. 2004, 32: e34-10.1093/nar/gnh026.
Durbin R, Eddy S, Krogh A, Mitchison G: Biological sequence analysis. Probabilistics models of proteins and nucleic acids. 1998, Cambridge, UK, Cambridge University Press
Felsenstein J: Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol. 1996, 266: 418-427.
Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003, 100: 9440–9445-10.1073/pnas.1530509100.
We thank I. Nayach for procurement and processing of prostate tissue samples and R. Muñoz for the help with the web server. JJL was partly supported by a NATO postdoctoral fellowship. This work has been facilitated by an institutional grant from Fundación Ramón Areces to the CBMSO, by grants GEN2001-4856-C13-10, GEN2001-4856-C13-07 and SAF2001-1969 from MCYT, and by grant PI020231 from FIS.
JJL and DA implemented the software and carried out the analysis; MS and RB performed the RT-PCR and Inmunohistochemistry experiments; PLF obtained the prostate samples and carried out the laser capture microdissections; TMT and ARO coordinated the work and wrote the manuscript.
About this article
Cite this article
Lozano, J.J., Soler, M., Bermudo, R. et al. Dual activation of pathways regulated by steroid receptors and peptide growth factors in primary prostate cancer revealed by Factor Analysis of microarray data. BMC Genomics 6, 109 (2005) doi:10.1186/1471-2164-6-109
- Prostate Cancer
- Androgen Receptor
- Nitrogen Starvation