- Research article
- Open Access
Further understanding human disease genes by comparing with housekeeping genes and other genes
© Tu et al; licensee BioMed Central Ltd. 2006
- Received: 04 October 2005
- Accepted: 21 February 2006
- Published: 21 February 2006
Several studies have compared various features of heritable disease genes with other so called non-disease genes, but they have yielded some conflicting results. A potential problem in those studies is that the non-disease genes contained a large number of essential genes – genes which are indispensable for humans to survive and reproduce. Since a functional disruption of an essential gene has fatal consequences, it's more reasonable to regard essential genes as extremely severe "disease" genes. Here we perform a comparative study on the features of human essential, disease, and other genes.
In the absence of a set of well defined human essential genes, we consider a set of 1,789 ubiquitously expressed human genes (UEHGs), also known as housekeeping genes, as an approximation. We demonstrate that UEHGs are very likely to contain a large proportion of essential genes. We show that the UEHGs, disease genes and other genes are different in their evolutionary conservation rates, DNA coding lengths, gene functions, etc. Our findings systematically confirm that disease genes have an intermediate essentiality which is less than housekeeping genes but greater than other human genes.
The human genome may contain thousands of essential genes having features which differ significantly from disease and other genes. We propose to classify them as a unique group for comparisons of disease genes with non-disease genes. This new way of classification and comparison enables us to have a clearer understanding of disease genes.
- Cystic Fibrosis Transmembrane Conductance Regulator
- Disease Gene
- Essential Gene
- Conservation Score
- Human Disease Gene
Identification of novel genes associated with human diseases is among the most critical tasks in medical research. Towards this goal, various features have been compared between heritable disease genes and non-disease genes [1–4]. Although most findings were consistent with each other, a few conflicting results showed up. For example, Smith et al.  found that disease genes evolved with higher nonsynonymous/synonymous substitution rate ratios (Ka/Ks) than non-disease genes, but Huang et al.  found no such significant differences. One common problem with these studies is that human essential genes were ignored and simply grouped together with other non-disease genes. Essential genes are genes whose functions are necessary for the organism to survive and reproduce. Since the disruption of essential genes' function will cause fatal consequences, they should be regarded as the most severe "disease" genes. Therefore, comparing disease genes to a mixture of essential and non-disease genes will reduce the clarity of the signals of the disease-related features and may even lead to erroneous findings. Thus, it is beneficial to separate human essential genes from other non-disease genes before comparisons are made.
Thousands of genes have been identified as essential genes in multiple model organisms, such as Saccharomyces cerevisiae, Caenorhabditis elegans, and Mus musculus [5–7]. Although it is almost certain that the human genome also contains hundreds to thousands of essential genes, it's impractical to experimentally determine them as in S. cerevisiae or C. elegans. The absence of a set of well-defined human essential genes poses a challenge on studying them and urges for alternative solutions.
The human genome has an extremely complex tissue expression profile. Some genes are expressed only in certain tissues during specific times, while others are constitutively and ubiquitously expressed [8, 9]. For the latter genes, they are presumed to be necessary for the most fundamental cellular physiological processes and are referred as housekeeping genes . Housekeeping genes have been studied by many researchers and some interesting observations have been reported. For example, Zhang and Li found that housekeeping genes evolved more slowly than tissue-specific genes . Eisenberg and Levanon found that housekeeping genes were compact in their coding lengths, which could be the result of higher selective pressure. Based on the unique properties of the ubiquitously expressed human genes (UEHGs), we believe that they are suitable candidates for essential genes. Although this hypothesis is intuitive and sounds reasonable, serious efforts are required to collect supportive evidence on a systematic level.
In this study, we consider a set of 1,789 ubiquitously expressed human genes (UEHGs) as an approximation for essential genes. We demonstrate that UEHGs are very likely to contain a large proportion of essential genes and thus can approximate human essential genes. By performing a three-way feature comparison of UEHGs (presumed essential genes), disease genes, and the rest of human genes (referred as other genes), we show that they are different in many aspects such as the evolutionary conservation rates, DNA coding lengths, gene functions, etc.
Instead of dividing the human genome into disease vs. non-disease genes, we choose a three-way classification, namely, UEHGs (presumed "essential"), disease, and other genes. We first validate that the set of UEHGs contains a large fraction of essential genes. Then by comparing the three groups of genes, we see how the disease genes can be distinguished from essential and other genes. If UEHGs really contain much greater fraction of essential genes than non-UEHGs (i.e. disease and other genes), we expect to observe the followings. First, as essential genes are functionally extremely important, the selective pressure on them are much higher than on non-essential genes, thus UEHGs should have a slower evolutionary rate than both disease and other genes [12, 13]. Second, since most Mendelian diseases are caused by deleterious amino acid substitutions, if we study the conservation at amino acid level, we expect to see different patterns for UEHGs, disease and other genes. Third, when UEHGs are mapped to another species, the homologous genes should more likely be essential in that species if the species is evolutionarily close to humans. Fourth, since essential proteins usually tend to be hub proteins (highly connected) in the protein-protein interaction network , UEHGs should have a higher average physical interaction degree than non-UEHGs. Fifth, the functions of UEHGs should be fundamentally important. To verify these hypotheses, we compile the lists of UEHGs, disease genes and other human genes. We then collect various features and compare those selected features among the three gene classifications.
Comparison on the evolutionary features
Comparison of evolutionary rate among three groups of genes
Cross-species comparison of gene deletion phenotypes
Cross-species comparison of gene essentiality between human and S. cerevisiae.
Homologs of UEHGs
384 (20.2% of UEHGs)*
138 (35.9%) (7.3% of UEHGs)*
Homologs of disease genes
196 (9.8% of disease genes)*
51 (26.0%) (2.5% of disease genes)*
Homologs of other genes
1005 (3.5% of other genes)*
379 (37.7%) (1.3% of other genes)*
Total yeast genes
Comparison on other features
All the results above support that UEHGs by themselves form a distinct group other than disease genes. The results also endorse that UEHGs may contain a large proportion of functionally essential genes. Although we try to show that UEHGs are good candidates for human essential genes, we have no intention to claim that they are the only or the best gene set for representing human essential genes. Because a gene needs to be ubiquitously expressed to be considered an UEHG, low expressed or somehow tissue specific expressed essential genes will be excluded. Also, since the tissue samples were collected mainly from adult individuals, genes which are essential for early stage development may be missed too. As revealed by the cross-species comparison, UEHGs may have failed to cover many essential genes and those genes are still classified as other genes. We study a different set of genes by considering genes that are conserved across yeast, C. elegans and human. The results indicate that they may contain a large fraction of essential genes too (results not shown). However, as pointed out by Chervitz et al. , such set may miss many human essential genes which don't have homologs in yeast and C. elegans. In contrast, UEHGs is a more unbiased sample from all essential genes. A combination of UEHGs and conserved genes might generate a more complete set of candidates for human essential genes. We also realize that the set of disease genes in our study are mainly genes associated with Mendelian diseases, while complex disease genes are under-represented.
Different from previous studies on human housekeeping gene, we define the UEHGs as genes expressed in "almost all" (not "exactly all") the tissues that are examined. Due to the fluctuation of gene expression and the error in the gene expression measurement, as more tissues being examined, fewer genes will be observed as expressed in all the tissues. We relax the criteria to allow missing expression in a small fraction of tissues so that the size of UEHGs is less sensitive to the number of tissues being examined. Also a different cutoff value of expression level was adopted. In order to verify that our results are not sensitive to specific criteria used to define UEHGs, we prepare another UEHGs set defined as genes expressed at more than 300 standard units in all the 79 tissues. This leads to 2,038 genes being grouped as UEHGs, and 1,509 genes are contained in the original set of 1,789 genes. The evolutionary rates are compared among the new set of UEHGs, disease genes and other genes. The results are almost identical as before except for the slight changes in P-values (see details in supplementary materials). This indicates that our findings are not sensitive to the criteria for defining the UEHGs.
Comparison of the length of various parts of UEHGs, disease genes, and all other genes.
UEHGs (n = 1400)
Disease (n = 1773)
Others (n = 10304)
Coding sequence length
1501 ± 38
2205 ± 73
1849 ± 15
Total exon length
2545 ± 48
3250 ± 78
2752 ± 18
Number of exons
10.7 ± 0.2
13.5 ± 0.3
9.9 ± 0.1
Total intron length
35698 ± 1558
60376 ± 2836
54881 ± 1139
5' UTR length
546 ± 22
582 ± 21
560 ± 8
3' UTR length
559 ± 21
569 ± 21
575 ± 8
We also want to point out that, as shown in Fig 6, there are no sharp dividing lines among essential genes, disease genes and other genes. Some diseases are simply lethal and the associated genes are essential genes by the definition. Some diseases have much less severe effects and it's hard to distinguish them from true non-disease genes. Thus, the gene essentiality might be better described by a continuous spectrum rather than by artificially divided groups. Even more complicated situations arise when different mutation forms are considered. Since different mutations usually lead to phenotypes of different severities [28, 29], a disease gene could be either a non-essential gene or essential gene but with non-lethal mutation form. Thus, any simple grouping of human genome may lack the power for accurately illustration of the complex scenario associated with human disease genes.
Our studies suggest that human essential genes are a unique group of genes and should not simply be ignored and classified with non-disease genes for the studies on disease genes. We also show that disease genes have several properties residing between essential and other genes. We notice that gene essentiality might better be described in a continuous spectrum instead of being assigned a class label. Nevertheless, the simplicity of the three-way classification is good for the purpose of this research since comparisons can be performed easily.
Extensive knowledge on human essential genes can be critical for the understanding of human diseases. It has been shown that essential genes may have direct association with diseases such as cancer [30, 31]. Studying human essential genes might also provide key clues for questions such as how human beings evolved. However, limited attentions have been paid to them and very little systematic studies have been done. We showed how the picture of disease genes gets clearer when we explicitly consider the essential genes. We believe the updated global picture of disease genes will enable us to better identify them in the future .
Compiling lists of disease genes and UEHGs list
The list of disease genes were obtained from OMIM . 3,962 records were listed in the morbidmap (Jun 6, 2005) and entries with known sequence (OMIM ID marked with *), with known sequence and phenotype (OMIM ID marked with #), and with phenotype description, molecular basis known (OMIM ID marked with +) were retained for this study. A total of 2,012 genes with unique OMIM Ids were finally collected as human disease genes.
Ubiquitously expressed genes were obtained from the result of a recent large scale microarray experiment on human gene expression patterns by Su et al. . A total of 33,698 genes sampled from 79 tissues were interrogated in their experiments. The overall gene expression level was 776.5 standard Affymetrix average difference units, and genes with expression level greater than 550 standard units in at least 73/79 tissues were selected as UEHGs (a conservative estimation on the percentage of essential genes in the human genome is about 10%, thus the standards were set so that roughly 2,000 genes would be classified as UEHGs). A total of 1,789 such genes were collected. The set of UEHGs has a small overlap with disease genes as 176 genes belong to both classes. The full list of UEHGs can be found in Additional file 2.
Collection of gene features
The mouse and rat homologs and corresponding synonymous substitution rate (Ks), nonsynomymous substitution rate (Ka) of totally 15,726 human genes were downloaded from NCBI HomoloGene . To prevent possible contamination by paralogous genes, we only considered one-to-one mapped orthologous pairs. To test the statistical significance of the difference of Ka, Ks and Ka/Ks distributions among the three groups, Kolmogorov-Smirnov test was used to calculate the p-value as in  so that direct comparisons could be possible. Nucleotide conservation scores were downloaded from UCSC Genome Browser website . Human sequence variation information was obtained from Swiss-Prot protein knowledgebase . The original amino acid positions were mapped to nucleotide positions on the corresponding chromosome to obtain the conservation score. To study the correlation of the onset age of a disease with its conservation, we obtained the onset ages of over 900 genes from . Weighted least square regression is used to find the correlations between disease onset ages and Ka/Ks ratios .
Yeast genes were collected from NCBI Entrez Gene Database  and were divided into four groups: UEHG homologs, disease gene homologs, other human gene homologs, and genes without human homologs. The homologies were obtained from NCBI HomoloGene as described above. The yeast gene deletion phenotype data were downloaded from Saccharomyces Genome Database . Similarly, genes in C. elegans were collected from NCBI Entrez Gene Database and were divided into four groups. RNAi phenotypes of C. elegans genes were retrieved from WormBase . The RNAi phenotypes were divided into four categories: lethal (including both embryonic and larval lethal), wild type, sick (phenotypes other than the above two), and unknown. For genes annotated with more than one phenotype, the most severe one (assuming lethal>sick>wild) was chosen as their phenotypes.
The degrees of genes in the protein physical interaction network were retrieved from the Human Protein Reference Database (HPRD) . To compare the function distribution of the genes in different categories, we used Gene Ontology (GO) Biological Process for protein function annotation. Gene Ontology annotations of 12,715 human genes were downloaded from NCBI  and the classifications based on biological processes were used. Similar to Zhou et al. , a GO node is referred as informative if it covers more than 500 genes, and none of its descendant nodes cover that many genes. 25 GO informative nodes were defined according to the criterion. To test whether UEHGs, disease genes or other genes were over/under represented in each of the 25 function categories, we used hyper-geometric distribution to calculate the p-value.
Gene length information was retrieved from UCSC genome table browser . All the genes were first mapped to their refSeq IDs for length information retrieval. To assess the significance of the difference in the length of genes in different categories, Wilcoxon rank sum test was used to calculate the p-value.
In the process of collecting various features, some genes were not annotated in certain databases. We limited our comparisons to genes with information. The number of genes included for each comparison can be found in the corresponding tables or figures. For more information on the method and materials, see Additional file 1.
We thank all the reviewers for the great suggestions on revision. We thank Andrew Su and John Hogenesch for answering the question regarding to the expression cutoff value for UEHGs. We also thank Kim Fechtel for providing their classification of genes so we can compare the two gene sets. We thank Matthew Lebo for proofreading the manuscript. This research is supported by NIH/NSF joint mathematical biology initiative DMS-0241102 and by NIH P50 HG 002790.
- Lopez-Bigas N, Ouzounis CA: Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res. 2004, 32: 3108-3114. 10.1093/nar/gkh605.PubMedPubMed CentralView ArticleGoogle Scholar
- Bortoluzzi S, Romualdi C, Bisognin A, Danieli GA: Disease genes and intracellular protein networks. Physiol Genomics. 2003, 15: 223-227.PubMedView ArticleGoogle Scholar
- Simth NGC, Eyre-Walker A: Human disease genes: patterns and predictions. Gene. 2003, 318: 169-175. 10.1016/S0378-1119(03)00772-8.View ArticleGoogle Scholar
- Huang H, Winter EE, Wang H, Weinstock KG, Xing H, Goodstadt L, Stenson PD, Cooper DN, Smith D, Alba MM, Pointing CP, Fechtel K: Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes. Genome Biol. 2004, 5: R47-10.1186/gb-2004-5-7-r47.PubMedPubMed CentralView ArticleGoogle Scholar
- Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, Dow S, Lucau-Danila A, Anderson K, Andre B, Arkin AP, Astromoff A, Bakkoury ME, Bangham R, Benito R, Brachat S, Campanaro S, Curtiss M, Davis K, Deutschbauer A, Entian K, Flaherty P, Foury F, Garfinkel DJ, Gerstein M, Gotte D, Guldener U, Hegemann JH, Hempel S, Herman Z, Jaramillow DF, Kelly DE, Kelly SL, Kotter P, LaBonte D, Lamb DC, Lan N, Liang H, Liao H, Liu L, Luo C, Lussier M, Mao R, Menard P, Ooi SL, Revuelta JL, Roberts CJ, Rose M, Ross-Macdonald P, Scherens B, Schimmack G, Shafer B, Shoemaker DD, Sookhai-Mahadeo S, Storms RK, Strathern JN, Valle G, Voet M, Volckaert G, Wang C, Ward TR, Wilhelmy J, Winzeler EA, Yang Y, Yen G, Youngman E, Yu K, Bussey H, Boeke JD, Snyder M, Philippsen P, Davis RW, Johnson M: Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002, 418: 387-391. 10.1038/nature00935.PubMedView ArticleGoogle Scholar
- Sonnichsen B, Koski LB, Walsh A, Marschall P, Neumann B, Brehm M, Alleaume A-M, Artelt J, Bettencourt P, Cassin E, Hewitson M, Holz C, Khan M, Lazik S, Martin C, Nitzsche B, Ruer M, Stamford J, Winzi M, Heinkel R, Roder M, Finell J, Hantsch H, Jones SJ, Jones M, Piano F, Gunsalus KC, Oegema K, Gonczy P, Coulson A, Hyman AA, Echeverri CJ: Full-genome RNAi profiling of early embryogenesis in Caenorhabditis elegans. Nature. 2005, 434: 462-469. 10.1038/nature03353.PubMedView ArticleGoogle Scholar
- Blake JA, Richardson JE, Bult CJ, Kadin JA, Eppig JT: The Mouse Genome Database Group: MGD: The Mouse Genome Database. Nucleic Acids Res. 2003, 31: 193-195. 10.1093/nar/gkg047.PubMedPubMed CentralView ArticleGoogle Scholar
- Warrington JA, Nair A, Mahadevappa M, Tsyganskaya M: Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes. Physiol Genomics. 2000, 2: 143-147.PubMedGoogle Scholar
- Butte AJ, Dzau V, Glueck SB: Further defining housekeeping, or "maintenance," genes Foucus on "A compendium of gene expression in normal human tissues". Physiol Genomics. 2001, 7: 95-96.PubMedGoogle Scholar
- Zhang L, Li W-H: Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol Biol Evol. 2004, 21: 236-239. 10.1093/molbev/msh010.PubMedView ArticleGoogle Scholar
- Eisenberg E, Levanon EY: Human housekeeping genes are compact. Trends Genet. 2003, 19: 362-365. 10.1016/S0168-9525(03)00140-9.PubMedView ArticleGoogle Scholar
- Hirsh AE, Fraser HB: Protein dispensability and rate of evolution. Nature. 2001, 411: 1046-1049. 10.1038/35082561.PubMedView ArticleGoogle Scholar
- Jordan IK, Rogozin IB, Wolf YI, Koonin EV: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 2002, 12: 962-968. 10.1101/gr.87702. Article published online before print in May 2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Jeong H, Mason SP, Barbasi A-L, Oltvai ZN: Lethality and centrality in protein networks. Nature. 2001, 411: 41-42. 10.1038/35075138.PubMedView ArticleGoogle Scholar
- Bourque G, Pevzner PA, Tesler G: Reconstructing the genomic archeitechture of ancestral mammals: lessons from human, mouse, and rat genomes. Genome Res. 2005, 14: 507-516. 10.1101/gr.1975204.View ArticleGoogle Scholar
- Wolfe KH, Sharp PM: Mammalian gene evolution: nucleotide sequence divergence between mouse and rat. J Mol Evol. 1993, 37: 441-456. 10.1007/BF00178874.PubMedView ArticleGoogle Scholar
- Ohta T, Ina Y: Variation in synonymous substitution rates among mammalian genes and the correlation between synonymous and nonsynonymous divergences. J Mol Evol. 1995, 41: 717-720.PubMedGoogle Scholar
- Makalowski W, Boguski MS: Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc Natl Acad Sci USA. 1998, 95: 9407-9412. 10.1073/pnas.95.16.9407.PubMedPubMed CentralView ArticleGoogle Scholar
- Smith NG, Hurst LD: The effect of tandem substitutions on the correlation between synonymous and nonsynonymous rates in rodents. Genetics. 1999, 153: 1395-1402.PubMedPubMed CentralGoogle Scholar
- Castresana J: Estimation of genetic distances from human and mouse introns. Genome Biol. 2002, 3: R28-10.1186/gb-2002-3-6-research0028.View ArticleGoogle Scholar
- Pagani F, Raponi M, Baralle FE: Synonymous mutations in CFTR exon 12 affect splicing and are not neutral in evolution. Proc Natl Acad Sci USA. 2005, 102: 6368-6372. 10.1073/pnas.0502288102.PubMedPubMed CentralView ArticleGoogle Scholar
- Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005, 15: 1034-1050. 10.1101/gr.3715005.PubMedPubMed CentralView ArticleGoogle Scholar
- Miller MP, Kuman S: Understanding human disease mutations through the use of interspecific genetic variation. Hum Mol Genet. 2001, 10: 2319-2328. 10.1093/hmg/10.21.2319.PubMedView ArticleGoogle Scholar
- Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TKB, Gronborg M, Ibarrola N, Deshpande N, Shanker K, Shivashankar HN, Rashmi BP, Ramya MA, Zhao Z, Chandrika KN, Padma N, Harsha HC, Yatish AJ, Kavitha MP, Menezes M, Choudhury DR, Suresh S, Ghosh N, Saravana R, Chandran S, Krishna S, Joy M, Anand SK, Madavan V, Joseph A, Wong GW, Schiemann WP, Constantinescu SN, Huang L, Khosravi-Far R, Steen H, Tewari M, Ghaffari S, Blobe GC, Dang CV, Garcia JG, Pevsner J, Jensen ON, Roepstorff P, Deshpande KS, Chinnaiyan AM, Hamosh A, Chakravarti A, Pandey A: Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 2003, 13: 2363-2371. 10.1101/gr.1680803.PubMedPubMed CentralView ArticleGoogle Scholar
- Jimenez-Sanchez G, Childs B, Valle D: Human disease genes. Nature. 2001, 409: 853-855. 10.1038/35057050.PubMedView ArticleGoogle Scholar
- Carroll RJ, Ruppert D: Transformation and Weighting in Regression. 1988, New York: Chapman and HallView ArticleGoogle Scholar
- Chervitz SA, Aravind L, Sherlock G, Ball CA, Koonin EV, Dwight SS, Harris MA, Dolinski K, Mohr S, Smith T, Weng S, Cherry JM, Botstein D: Comparison of the complete protein sets of worm and yeast: Orthology and Divergence. Science. 1998, 282: 2022-2028. 10.1126/science.282.5396.2022.PubMedPubMed CentralView ArticleGoogle Scholar
- Ng PC, Henikoff S: Accounting for human polymorphisms predicted to affect protein function. Genome Res. 2002, 12: 436-446. 10.1101/gr.212802.PubMedPubMed CentralView ArticleGoogle Scholar
- Hamosh A, Scott AF, Ambergeer JS, Bocchini CA, McKusick VA: Online mendelian inheritance in man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005, 33: D514-D517. 10.1093/nar/gki033.PubMedPubMed CentralView ArticleGoogle Scholar
- Pickeral OK, Li JZ, Barrow I, Boguski MS, Makałowski W, Zhang J: Classical Oncogenes and Tumor Suppressor Genes: A Comparative Genomics Perspective. Neoplasia. 2000, 2: 280-286. 10.1038/sj.neo.7900090.PubMedPubMed CentralView ArticleGoogle Scholar
- Thomas MA, Weston B, Joseph M, Wu W, Nekrutenko A, Tonellato PJ: Evolutionary dynamics of oncogenes and tumor suppressor genes: higher intensities of purifying selection than other genes. Mol Biol Evol. 2003, 20: 964-968. 10.1093/molbev/msg110.PubMedView ArticleGoogle Scholar
- Adie EA, Adams RR, Evans KL, Porteous D, Pickard BS: Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinformatics. 2005, 6: 55-10.1186/1471-2105-6-55.PubMedPubMed CentralView ArticleGoogle Scholar
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB.: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004, 101: 6062-6067. 10.1073/pnas.0400782101.PubMedPubMed CentralView ArticleGoogle Scholar
- Wheeler DL, Church DM, Edgar R, Federhen S, Helmberg W, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E, Suzek TO, Tatusova TA, Wagner L: Database resources of the national center for biotechnology information: update. Nuclei Acids Res. 2004, 32: D35-D40. 10.1093/nar/gkh073.View ArticleGoogle Scholar
- Conservation score. [http://hgdownload.cse.ucsc.edu/downloads.html#human]
- Human sequence variation. [http://us.expasy.org/sprot/sp-docu.html]
- NCBI Entrez gene. [http://0-www.ncbi.nlm.nih.gov.brum.beds.ac.uk/entrez/query.fcgi?db=gene]
- Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, Hester ET, Jia Y, Juvik G, Roe T, Schroeder M, Weng S, Botstein D: SGD: Saccharomyces Genome Database. Nucleic Acids Res. 1998, 26: 73-79. 10.1093/nar/26.1.73.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen N, Harris TW, Antoshechkin I, Bastiani C, Bieri T, Blasiar D, Bradnam K, Canaran P, Chan J, Chen CK, Chen WJ, Cunningham F, Davis P, Kenny E, Kishore R, Lawson D, Lee R, Muller HM, Nakamura C, Pai S, Ozersky P, Petcherski A, Rogers A, Sabo A, Schwarz EM, Van Auken K, Wang Q, Durbin R, Spieth J, Sternberg PW, Stein LD: WormBase: a comprehensive data resource for Caenorhabditis biology and genomics. Nucleic Acids Res. 2005, 33: D383-D389. 10.1093/nar/gki066.PubMedPubMed CentralView ArticleGoogle Scholar
- Gene ontology annotation. [ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/]
- Zhou X, Kao MJ, Wong WH: Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci USA. 2002, 99: 12783-12788. 10.1073/pnas.192159399.PubMedPubMed CentralView ArticleGoogle Scholar
- Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ: The UCSC Table Browser data retrieval tool. Nucl Acids Res. 2004, 32 (Suppl 1): D493-D496. 10.1093/nar/gkh103.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.