- Research article
- Open Access
A statistical framework for consolidating "sibling" probe sets for Affymetrix GeneChip data
© Li et al; licensee BioMed Central Ltd. 2008
- Received: 05 September 2007
- Accepted: 24 April 2008
- Published: 24 April 2008
Affymetrix GeneChip typically contains multiple probe sets per gene, defined as sibling probe sets in this study. These probe sets may or may not behave similar across treatments. The most appropriate way of consolidating sibling probe sets suitable for analysis is an open problem. We propose the Analysis of Variance (ANOVA) framework to decide which sibling probe sets can be consolidated.
The ANOVA model allows us to separate the sibling probe sets into two types: those behave similarly across treatments and those behave differently across treatments. We found that consolidation of sibling probe sets of the former type results in large increase in the number of differentially expressed genes under various statistical criteria. The approach to selecting sibling probe sets suitable for consolidating is implemented in R language and freely available from http://research.stowers-institute.org/hul/affy/.
Our ANOVA analysis of sibling probe sets provides a statistical framework for selecting sibling probe sets for consolidation. Consolidating sibling probe sets by pooling data from each greatly improves the estimates of a gene expression level and results in identification of more biologically relevant genes. Sibling probe sets that do not qualify for consolidation may represent annotation errors or other artifacts, or may correspond to differentially processed transcripts of the same gene that require further analysis.
- Papillary Thyroid Carcinoma
- Spermatogonial Stem Cell
- Model Treatment Effect
- Chip Definition File
- Papillary Thyroid Carcinoma Sample
According to Affymetrix, there are three primary reasons for designing sibling probe sets for the same gene: first, some cDNAs may be thought to come from different loci at the time of chip design, but later genome annotation maps them to the same gene; second, some probe sets turn out to cross-hybridize in an unpredictable manner, and additional probe sets with better specificity are designed for the same gene; third, probe sets specific to RNA variants, such as products of alternative splicing, or highly similar gene family or transcripts with different polyA sites, have been designed on purpose. Correspondingly, Affymetrix probe set name suffixes try to indicate these design purposes, such as probe sets with "s" and "x" suffixes are thought to be prone to cross-hybridization, and probe sets with an "a" suffix represent alternative splicing variants. However, two independent studies showed that different expression scores of sibling probe sets are not due to the inclusion of these suboptimal probe sets, and there is lack of evidence showing that these suboptimal probe sets performed worse than "better designed" probe sets [10, 11]. Clearly the sibling probe sets problem must be tackled in analyzing Affymetrix microarray data, but the existing strategies have been very different.
Naive approaches to sibling probe sets are either to treat them in the same way as different genes  or to arbitrarily choose one sibling probe set as the representative of the gene and ignore the other sets [13, 14, 10]. For example, Jordan et al proposed to select the probe set with the highest expression value among the siblings , whereas Liao and Zhang  randomly picked one sibling probe set for their analysis. All these approaches solve the problem by discarding data in an arbitrary manner. There does not seem to be a systematic guideline for consolidating sibling probe sets. In the effort of remapping the probes to probe sets for creating a custom Chip Definition File (CDF), Dai et al  defined one gene mapping one probe set to avoid "redundant probe sets" in gene chip analysis. It has been shown that these updated probe set definitions provide both better precision and accuracy in probe set expression estimates compared to the original Affymetrix definition of hgu133a chip .
Elbez et al studied how well sibling probe sets measure the same gene expression on Affymetrix hgu133a GeneChip . Using correlation statistics, they defined two groups of probe set pairs – pairs that are highly correlated and pairs that are not. They derived an empirical rule for Affymetrix hgu133a GeneChip that highly correlated sibling probe sets should be consolidated and others should not be. However, their approach suffers from the following limitations. First, they did not study multiple probe sets (more than 2) correlation, as about 18% of genes on the mouse chip have 3 or more sibling probe sets (Fig. 1). Second, only informative pairs (probes sets showing changes in transcription among different measurements) are included in their analysis, whereas the pairs that show no difference in expression are left alone, which possibly introduces some bias in results. Recently, Stalteri and Harrison published a case study using a mouse gene "Surf4" and determined that some sibling probe sets on the mouse moe430a array with inconsistent measures were to detect alternative splicing (poly(A) sites) or errors .
It seems appropriate to consolidate sibling probe sets that behave similarly, since they are more likely to be hidden replicates of the expression values of the same target gene. In contrast, sibling probe sets showing inconsistent expression values may represent real biological phenomena, or perhaps stem from annotation errors or other artifacts, and should not be consolidated in either case. In this work, we propose a statistical method for consolidating the sibling probe sets in the context of detecting differentially expressed genes over two or more physiological/genetic conditions. We cast the problem of automatic determination of the sibling probe set type in the ANOVA framework, in which the differential expression between sibling probe sets, treatments and their mutual influence are simultaneously inspected in a two-way ANOVA model (Eq. 1) or it's extension with block effect (Eq. 2) and test whether their interaction is significant. Insignificant interaction effect indicates that sibling probe sets are more likely to behave similarly and provides evidence for consolidation. This approach is referred as the per-gene approach throughout the paper.
We compare our approach to the two existing approaches: the per-probeset approach and the custom CDF approach. The per-probeset approach treats all sibling probe sets as distinct genes and is widely used in the literature. The custom CDF approach uses the redefined probe sets by assembling all probes mapping to the same gene to one probe set based on the genome database. There are usually multiple versions of custom CDFs for one platform due to multiple genome databases. For example, the UniGene custom CDF maps to the UniGene database. Using three publicly available Affymetrix datasets [18–20], we show that the per-gene approach is able to call more biologically relevant genes than the two other approaches.
The Statistical Framework for Consolidating Sibling Probe Sets
It is often seen that the microarray experiment involves paired samples, for example, a pair of treatment and control samples are from the same individual. For these experiments, we add a block factor to the existing one-way (Eq. 4) and two-way ANOVA model (Eq. 2) to take into account the correspondence relationship between each pair.
Example 1: Discriminative Analysis over Treatment and Control
In the first example we compared the per-gene, the per-probeset and the custom CDF approaches by screening differentially expressed genes between Nrl knockout and wild type mouse at developmental stage Postnatal day 10 (P10) . Nrl is the Maf-family transcription factor and the key regulator of photoreceptor differentiation in mammals. Nrl knockout causes slow but progressive vision loss in mammals . We used RMA  to get the expression value for each probe set.
For 15, 632 genes that are represented by a single probe set on moe4302 GeneChip (Fig. 1), we performed one-way ANOVA with both equal variance and unequal variance assumption. Correspondingly for the 10, 049 genes that are represented by multiple sibling probe sets, we performed two-way ANOVA analysis with interaction between the two fixed effects τ and φ (Eq. 1). Specifically, we model probe set (φ) and treatment (τ) (Wild Type vs. Nrl-ko) as two factors as well as their interaction (whether differential expression changes over probe sets or vice versa). There are 62 sibling probe set genes whose interaction terms were called significant at False Discovery Rate (FDR, Benjamini-Hochberg (BH) Procedure ) no larger than 1%. It means that the differential expression over wild type and Nrl-ko conditions is dependent on the sibling probes sets or vice versa. For this reason we treated the 255 probe sets mapping to these 62 genes as individual probe sets, followed by fitting the one-way ANOVA model with treatment effect only (Eq. 3, Fig. 2). Finally, raw P-values of the treatment effect were combined from the full two-way ANOVA and the one-way ANOVA. The number of hypotheses tests reduced from 45, 101 in the per-probeset approach to 25, 917 = 15, 675 + 10, 049 - 62 + 255 in the per-gene approach.
Performance comparisons in terms of numbers of differentially expressed genes.
RawP cut-off (6.5e-05)
RawP cut-off (6.5e-05)
RawP cut-off (6.5e-05)
RawP cut-off (6.5e-05)
The analyses under the assumption of equal variance and using the other multi-test correction methods such as Bonferroni, raw P-values cut-off and FDR under general dependency (Benjamini-Yeuketieli Procedure, BY)  follow the same trend (Table 1).
Example 2: Cancer Gene Markers Identification using Paired Samples
In the second example, we compared the three approaches by screening differentially expressed genes between paired normal and thyroid cancer tissues as potential molecular markers on the Affymetrix hgu133plus2 Array. The data set (GSE3678) contains gene expression profiles of seven Papillary Thyroid Carcinoma (PTC) samples compared to seven paired normal samples. GCRMA  was used to normalize and summarize expression score for each probe set in each tissue sample. Since this data set is different from the mouse chip data analysis because of paired data, we reported P-values from the extended two-way ANOVA model with patient as a block effect (Eq. 2) for the genes that its representative multiple probe sets are consolidated (insignificant interaction effect between probeset and treatment). For the independent probe set or the single probe set, we reported P-values from the extended one-way ANOVA model with patient as a block effect (Eq. 4). Note that the latter analysis corresponds to the familiar paired t-test of treatment effect.
Controlling FDR at the level of 0.01 using "BH" procedure, the per-gene approach and the per-probeset approach call 402 and 32 differentially expressed genes between normal and PTC samples respectively, while the UniGene custom CDF approach made 24 significant calls and the ensEMBL gene custom CDF approach made 25 significant calls. It consistently shows that the per-gene approach dominates the per-probeset approach in that 31 out of 32 probe sets (Fig. 4b) called by the per-probeset approach were also called by the per-gene approach. 23 out of 24 genes that are identified by the UniGene custom CDF approach and 22 out of 25 genes that are identified by the ensEMBL gene custom CDF approach are also identified by the per-gene approach. Using other multiple tests correction procedures follows the same trend (see Additional File 2).
Comparison in terms of cancer functional categories.
Blood Vessel Development
Example 3: Spermatogonial Stem Cell Self-Renewal Gene Markers Identification
In order to determine whether the per-gene approach consistently outperforms the per-probeset and the custom CDF approach under varied experiment conditions such as multiple treatment, normalization and summarization methods, we further compared three approaches on a third data set. The third microarray data set (GSE4799) profiled gene expression over five time-points before and after GDNF/GFRα 1 replacement with a total of 15 samples. For this data set, we used GCRMA , RMA  and MBEI  pre-processing methods for Affymetrix CDF and three version of custom CDFs (UniGene, EntrezGene, and ensEMBL gene). Similar to our previous analysis, we reported P-values from (Eq. 3) or (Eq. 1) depending on whether the interaction effect is significant.
Comparison in terms of stem cell self-renewal functional categories.
Regulation of Cell Growth
We have demonstrated the advantages of consolidating sibling probe sets whenever possible in the context of detecting differential expression using popular Affymetrix moe4302 and hgu133plus2 platforms. Consolidating sibling probe sets is determined automatically through statistical test of probe set by treatment interaction effect in the two-way ANOVA model. It improves the analysis in two ways. First, pooling data from sibling probe sets improves the estimation of mean and variance of the observed gene expression level so that the significance of differential expression (P-value) is more accurately estimated. Second, pooling enhances the power of statistical tests, because it reduces the number of simultaneously hypothesis tests by consolidating the redundant sibling probe sets into one probe set. Like all the other approaches, the per-gene approach is also susceptible to the gene annotation. In cases that Affymetrix annotation linked distinct genes that happen to have a similar expression pattern in the given experiment, this approach will fail to separate these genes.
Formulating sibling probe sets consolidating rule is still an open problem. Elbez et al identified the problem of current Affymetrix probe set mapping is due to inaccurate genome annotation through analyzing the so-called "bad pairs" , and Dai et al derived the consolidating rule externally using customized CDF in a bottom-up fashion, i.e., using the most updated genome annotation from diverse databases to redefine the mapping of probes to probe sets so as to consolidate sibling probe sets . The set of post hoc assembled solutions are useful and have been shown to provide better estimation of gene expression .
We addressed the same issue using a data-driven approach, that is, our approach does not rely on any databases, but rather formulate a consolidating rule internally using expression data of sibling probe sets.
We want to emphasize that we do not anticipate giving a universal recommendation to always consolidate some sibling pairs of probe sets. To the contrary, our approach provides a method to consolidate sibling probe sets whenever applicable, and consolidation is only based on the observed data in a particular experiment. We have no intention to predict the consolidation rule in a new data sets based on the one derived from previous analyzed data sets. As illustrated in our Additional File 3 and data from Elbez et al , expression values of sibling probe sets might show a high correlation in one experiment by not in another. However, causes of probe set pairs showing a high correlation in one data set, but a low correlation in another are not well studied.
Our framework may affect subsequent analysis such as clustering and networking. For example, in both gene clustering and networking, the focus is often on a small subset of differentially expressed genes. Without consolidating sibling probe sets, the per-probeset approach often retains redundant probe sets of the same gene, which is not only problematic for network and clustering visualization and interpretation, but also substantially lowers the statistical power of the biological discovery. In gene set enrichment analysis using enrichment score , the expression value of the gene could be denoted by the mean of expression values of multiple probe sets that mapping to the same gene if these multiple probe sets are consolidated based on statistical tests.
Another important feature of the per-gene approach to rank differentially expressed genes is: the well-characterized genes (functions may still remain elusive) are more enriched in the top ranked list produced by the per-gene approach than by the per-probeset approach. One possible explanation is that Affymetrix designs sibling probe sets mostly for the well-characterized genes. Consolidating these sibling probe sets wherever applicable will substantially increase the sample size for more reliable detecting the differential expressions for these genes. The per-gene approach is particularly useful for less well-annotated genomes for which the enrichment of well-characterized genes in the top ranked list would markedly facilitate our understanding the underlying biological process.
The first Affymetrix data set we used was generated by Akimoto et al  using Affymetrix mouse moe4302 chip. The data was downloaded from the Gene Expression Omnibus (GEO) database using accession number GSE4051. We focused on identifying differentially expressed genes at developmental maturity stage P10 with 4 replicates in both wild type and Nrl-ko conditions. We chose to compare the differentially expressed genes between wild type and knockout at developmental stage P10, as it reflects the popular experimental design in microarray analysis for comparing two conditions. The P10 is chosen because it is the starting point of the mature state of photoreceptor differentiation.
The second Affymetrix data set we used was generated by Reyes et al  using Affymetrix human hgu133plus2 chip. The data was downloaded from the GEO database using accession number GSE3678. The experiment profiles gene expression in 7 paired PTC patient samples and normal samples.
The third Affymetrix data set we used was generated by Oatley using Affymetrix mouse4302 chip . GDNF-regulated gene expression was studied in cultures of actively self-renewing spermatogonial stem cells established from 6 day old male mice. GDNF is the essential growth factor regulating mouse spermatogonial stem cell self-renewal. The gene expression was measured prior to withdraw, after withdraw and 2, 4, 8 hours of GDNF/GFRα replacements with 3 replicates for each time points. The data was downloaded from the Gene Expression Omnibus (GEO) database using accession number GSE4799.
For genes with sibling probe sets, we fit the full two-way ANOVA model (Eq. 1) with probe set by treatment interaction to the pooled data. If the interaction effect τψ is insignificant after multiple-test correction (as we used FDR ≤ 0.01, Benjamini-Hochberg Procedure ), we then report P-values of the treatment effect τ; otherwise we consider sibling probe sets as independent probe sets. For the gene corresponding to a single probe set and these independent probe sets, we fit the one-way ANOVA model (Eq. 3) where only model treatment effect is included.
Two-way ANOVA model
Let y ijk be the normalized and summarized probe set intensity score for the i th gene, j th probe set and k th replicates of this probe set, we model treatment effect (τ i ), probe set effect ψ j and their interaction effect (τψ) ij as two factors with interaction having i and j levels, i = 1, 2, . . . , I, j = 1, 2, . . . , J where I represents the number of conditions to compare, and J represents the number of sibling probe sets for one gene:
y ijk = μ + τ i + ψ j + (τψ) ij + ε ijk (1)
Let β represents the block factor, where k presents block size, k = 1, 2, the two-way ANOVA model with block effect is:
y ijk = μ + τ i + ψ j + β k + (τψ) ij + ε ijk (2)
One-way ANOVA model
Define y jk is the normalized and summarized probe set intensity score for j th probe set and k th replicates, we model treatment effect (τ j ) as fixed effect having j levels, j = 1, 2, . . . , I:
y jk = μ + τ j + ε jk (3)
Similarly, the one-way ANOVA model with block effect is:
y jk = μ + τ j + β k + ε jk , (4)
where k = 1, 2.
R function lm() was used to fit one-way and two-way ANOVA models.
The Custom CDF Approach
Custom CDF files (version 8) were downloaded from  for hgu133plus2 and moe4302 platforms. Probe set definitions mapped to UniGene database, EntrezGene database and ensEMBL gene database were considered in this work. The probe set expression was calculated using one or all of three normalization methods (MBEI, RMA, GCRMA). The differentially expressed genes were identified using model 3 as were used for the per-probeset approach.
GO Enrichment Analysis
For gene lists generated by per-gene or per-probeset approaches, we used Bioconductor package "GOstats"  to perform GO enrichment analysis. For gene lists generated by the customCDF approach, we retrieved counts of the GO terms that are associated with the differentially expressed gene list and the whole genome list by querying Ensemble databases, and then performed hypergeometric test using R function phyper.
We would like to thank Drs. Arcady Mushegian and Manisha Goel for critically reading the manuscript.
- Kerr M, GA C: Statistical design and the analysis of gene expression microarray data. Genet Res. 2001, 77 (2): 123-128. 10.1017/S0016672301005055.PubMedGoogle Scholar
- Cui X, Hwang J, Qiu J, Blades N, Churchill G: Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics. 2005, 6: 59-75. 10.1093/biostatistics/kxh018.PubMedView ArticleGoogle Scholar
- Hoeschele I, Li H: A note on joint versus gene-specific mixed model analysis of microarray gene expression data. Biostatistics. 2005, 6 (2): 183-186. 10.1093/biostatistics/kxi001.PubMedView ArticleGoogle Scholar
- Hero A, Fleury G, Mears A, Swaroop A: Multicriteria Gene Screening for Analysis of Differential Expression with DNA Microarrays. EURASP Journal of Applied Signal Processing. 2004, 1: 43-52. 10.1155/S1110865704310036.View ArticleGoogle Scholar
- Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999, 286 (5439): 531-537. 10.1126/science.286.5439.531.PubMedView ArticleGoogle Scholar
- Eisen M, Spellman P, Brown P, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998, 95 (25): 14863-14868. 10.1073/pnas.95.25.14863.PubMedPubMed CentralView ArticleGoogle Scholar
- Qin Z: Clustering microarray gene expression data using weighted Chinese restaurant process. Bioinformatics. 2006, 22 (16): 1988-1997. 10.1093/bioinformatics/btl284.PubMedView ArticleGoogle Scholar
- Zhu D, Li Y, Li H: Multivariate correlation estimator for inferring functional relationships from replicated genome-wide data. Bioinformatics. 2007, 23 (17): 2298-2305. 10.1093/bioinformatics/btm328.PubMedView ArticleGoogle Scholar
- Zhu D, Hero A, Cheng H, Khanna R, Swaroop A: Network constrained clustering for gene microarray data. Bioinformatics. 2005, 21 (21): 4014-4020. 10.1093/bioinformatics/bti655.PubMedView ArticleGoogle Scholar
- Liao B, Zhang J: Evolutionary conservation of expression profiles between human and mouse orthologous genes. Mol Biol Evol. 2006, 23 (3): 530-540. 10.1093/molbev/msj054.PubMedView ArticleGoogle Scholar
- Elbez Y, Farkash-Amar S, Simon I: An analysis of intra array repeat: the good, the bad and the noninformative. BMC Genomics. 2006, 7 (136):Google Scholar
- Bourquin J, Subramanian A, Langebrake C, Reinhardt D, Bernard O, Ballerini P, Baruchel A, Cave H, Dastugue N, Hasle H, Kaspers G, Lessard M, Michaux L, Vyas P, Wering E, Zwaan C, Golub T, Orkinar S: Identification of distinct molecular phenotypes in acute megakaryoblastic leukemia by gene expression profiling. Proc Natl Acad Sci USA. 2006, 103 (9): 3339-3344. 10.1073/pnas.0511150103.PubMedPubMed CentralView ArticleGoogle Scholar
- Yanai I, Graur D, Ophir R: Incongruent expression profiles between human and mouse orthologous genes suggest widespread neutral evolution of transcription control. OMICS. 2004, 8: 15-24. 10.1089/153623104773547462.PubMedView ArticleGoogle Scholar
- Jordan I, Marino-Ramirez L, Koonin E: Evolutionary significance of gene expression divergence. Gene. 2005, 345: 119-126. 10.1016/j.gene.2004.11.034.PubMedPubMed CentralView ArticleGoogle Scholar
- Dai M, Wang P, Boyd A, Kostov G, Athey B, Jones E, Bunney W, Myers R, Speed T, Akil H, Watson S, Meng F: Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 2006, 33 (20): e175-10.1093/nar/gni179.View ArticleGoogle Scholar
- Sandberg R, Larsson O: Improved precision and accuracy for microarrays using updated probe set definitions. BMC Bioinformatics. 2007, 8 (48):Google Scholar
- Stalteri M, Harrison A: Interpretation of multiple probe sets mapping to the same gene in Affymetrix GeneChips. BMC Bioinformatics. 2007, 8 (13):Google Scholar
- Akimoto M, Cheng H, Zhu D, Brzezinski J, Khanna R, Filippova E, Oh E, Jing Y, Linares J, Brooks M, Zareparsi S, Mears A, Hero A, Glaser T, Swaroop A: Targeting of GFP to newborn rods by Nrl promoter and temporal expression profiling of flow-sorted photoreceptors. Proc Natl Acad Sci USA. 2006, 103 (10): 3890-3895. 10.1073/pnas.0508214103.PubMedPubMed CentralView ArticleGoogle Scholar
- Reyes I, Geliebter J: [http://0-www.ncbi.nlm.nih.gov.brum.beds.ac.uk/geo/query/acc.cgi?acc=GSE3678]
- Oatley J, Avarbock M, Telaranta A, Fearon D, Brinster R: Identifying genes important for spermatogonial stem cell self-renewal and survival. Proc Natl Acad Sci USA. 2006, 103 (25): 9524-9529. 10.1073/pnas.0603332103.PubMedPubMed CentralView ArticleGoogle Scholar
- Irizarry R, Bolstad B, Collin F, Cope L, Hobbs B, Speed T: Summaries of Affymetrix GeneChip Probe Level Data. Nucleic Acids Research. 2003, 31 (4): e15-10.1093/nar/gng015.PubMedPubMed CentralView ArticleGoogle Scholar
- Wu J, Irizarry R, Gentleman R, Murillo F, Spencer F: A model-based background adjustment for oligonucleotide expression arrays. JASA. 2004, 99 (468): 909-917.View ArticleGoogle Scholar
- Li C, Wong W: Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci USA. 2001, 98: 31-36. 10.1073/pnas.011404098.PubMedPubMed CentralView ArticleGoogle Scholar
- Mears A, Kondo M, Swain P, Takada Y, Bush R, Saunders T, Sieving P, Swaroop A: Nrl is required for rod photoreceptor development. Nat Genet. 2001, 29 (4): 447-452. 10.1038/ng774.PubMedView ArticleGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate-A practical and powerful approach to multiple testing. J Roy Stat Soc B Met. 1995, 57: 289-300.Google Scholar
- Lee R, Ting T, Lieberman B, Tobias D, Ho Y: Regulation of retinal cGMP cascade by phosducin in bovine rod photoreceptor cells. Interaction of phosducin and transducin. J Biol Chem. 1992, 267 (35): 25104-25112.PubMedGoogle Scholar
- Pittler S, Zhang Y, Chen S, Mears A, Zack D, Ren Z, Swain P, Yao S, Swaroop A, White J: Functional analysis of the rod photoreceptor cGMP phosphodiesterase alpha-subunit gene promoter: Nrl and Crx are required for full transcriptional activity. J Biol Chem. 2004, 279 (19): 19800-19807. 10.1074/jbc.M401864200.PubMedView ArticleGoogle Scholar
- Cheng H, Aleman T, Cideciyan A, Khanna R, Jacobson S, Swaroop A: In vivo function of the orphan nuclear receptor NR2E3 in establishing photoreceptor identity during mammalian retinal development. Human Molecular Genetics. 2006, 15 (17): 2588-2602. 10.1093/hmg/ddl185.PubMedPubMed CentralView ArticleGoogle Scholar
- Benjamini Y, Yekutieli D: The control of the false discovery rate under dependency. Ann Stat. 2001, 29 (48): 1165-1188.Google Scholar
- Huang Y, Prasad M, Lemon W, Hampel H, Wright F, Kornacker K, LiVolsi K, Frankel W, Kloos R, Eng C, Pellegata N, Chapelle A: Gene expression in papillary thyroid carcinoma reveals highly consistent profiles. Proc Natl Acad Sci USA. 2001, 98 (26): 15044-15049. 10.1073/pnas.251547398.PubMedPubMed CentralView ArticleGoogle Scholar
- A S, Tamayo P, Mootha V, Mukherjee S, Ebert B, Gillette M, Paulovich A, Pomeroy S, Golub T, Lander E, Mesirov J: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005, 102 (43): 15545-15550. 10.1073/pnas.0506580102.View ArticleGoogle Scholar
- Dai M: [http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download_v8.asp]
- Falcon S, Gentleman R: Using GOstats to test gene lists for GO term association. Bioinformatics. 2007, 23 (2): 257-258. 10.1093/bioinformatics/btl567.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.