- Open Access
Phenome-based gene discovery provides information about Parkinson’s disease drug targets
© The Author(s) 2016
- Published: 31 August 2016
Parkinson disease (PD) is a severe neurodegenerative disease without curative drugs. The highly complex and heterogeneous disease mechanisms are still unclear. Detecting novel PD associated genes not only contributes in revealing the disease pathogenesis, but also facilitates discovering new targets for drugs.
We propose a phenome-based gene prediction strategy to identify disease-associated genes for PD. We integrated multiple disease phenotype networks, a gene functional relationship network, and known PD genes to predict novel candidate genes. Then we investigated the translational potential of the predicted genes in drug discovery.
In a cross validation analysis, the average rank for 15 known PD genes is within top 0.8 %. We also tested the algorithm with an independent validation set of 669 PD-associated genes detected by genome-wide association studies. The top ranked genes predicted by our approach are enriched for these validation genes. In addition, our approach prioritized the target genes for FDA-approved PD drugs and the drugs that have been tested for PD in clinical trials. Pathway analysis shows that the prioritized drug target genes are closely associated with PD pathogenesis. The result provides empirical evidence that our computational gene prediction approach identifies novel candidate genes for PD, and has the potential to lead to rapid drug discovery.
- Parkinson’s disease
- Disease gene prediction
- Network analysis
- Drug discovery
Parkinson’s disease (PD) is the second most common neurodegenerative disorder with a significantly increasing prevalence . It involves pathological factors for cell death, such as mitochondrial dysfunction and oxidative stress [2, 3]. However, the highly complex and heterogeneous disease mechanisms are still inconclusive . Current pharmacological treatment shows limited efficacy in reversing progressive neuronal loss and controlling nondopamineric symptoms, such as dementia and sensory disturbances [5, 6], which have become a major source of patient disability. Detecting novel genetic basis for PD not only reveals the disease pathogenesis, but also facilitates identifying novel drug targets [1–3, 7].
Overlapping disease phenotypes may indicate common genetic basis of the diseases . Studying disease phenotypes of PD have the potential to uncover its underlying genetic factors [9, 10]. Previous studies have systematically analyzed disease networks based on phenotypic similarities to predict disease genes [11–14]. Currently, disease phenotype data sources remain largely incomplete. One disease phenotypic network is based on human phenotype ontology (HPO)  and has many applications [16–18]. Recently, we explored a new data source of human disease phenotype in biomedical ontologies and constructed the disease manifestation network (DMN). We showed that DMN contains new phenotypic knowledge and is useful in disease gene prediction . In this study, we propose to combine DMN and HPO, and detect novel candidate disease-associated genes for PD using a network-based gene prediction strategy.
Several recent studies showed that matching the traits of genes in Online Mendelian Inheritance in Man (OMIM)  and genome-wide association study (GWAS) [22, 23] with the drug targets may lead to the discovery of new drug treatments. In a recent study, we proved that the disease-associated genes predicted by computational approaches also have the potential to guide drug discovery . Here, we demonstrate that the candidate genes predicted for PD by our approach can provide information for PD drug targets. We evaluated the ranks of drug target genes for FDA-approved PD drugs and potential PD drugs that have been tested in clinical trials. We also performed pathway analysis for the top ranked drug target genes. The result provides empirical evidence that our gene prediction approach has the translation potential to lead to rapid drug discovery.
Predict genes for PD using a network-based approach
We downloaded the disease phenotype networks of HPO from http://human-phenotype-ontology.org and DMN from nlp.case.edu/public/data/DMN/. HPO contains 7395 nodes and 17,981,413 weighted edges. The disease phenotypic similarities are based on phenotype annotations extracted from OMIM, and were calculated as the semantic similarities in the phenotype ontology hierarchy . DMN contains 2312 nodes and 408,029 weighted edges. The disease phenotype annotations were based on semantic network in the Unified Medical Language System (UMLS), and disease similarities were calculated as the cosine similarities between phenotype feature vectors between diseases . Then we extracted 1,971,371 gene functional relationships from STRING  and constructed a gene network with 17,831 nodes. All data sources in STRING were used, including the protein interaction databases, pathway databases and gene coexpression data.
We constructed three bipartite networks to connect HPO, DMN and the gene network. We first extracted 4021 and 1872 disease-gene associations from OMIM to connect the disease nodes in HPO and DMN to the the gene nodes in the gene network, respectively. The disease nodes in HPO and DMN were represented by OMIM identifier and UMLS concept unique identifiers. Then, a total of 2250 maps between the two kinds of identifiers based on UMLS metathesaurus were used to connect HPO and DMN.
Predict candidate genes for PD
We first selected the seeds in the algorithm as the disease nodes of PD and their associated genes. PD has two forms: familial and sporadic. A major proportion of the patients have sporadic PD, and the associated genes in OMIM are for familial PD. However, extensive researches show that familial and sporadic PD are likely to share the same genetic pathways [27, 28]. Here, we extracted 15 PD genes from OMIM, and combined them with the PD disease nodes in both HPO and DMN to form a set of seeds.
In the above equation, P 1, P 2 and G represent DMN, HPO and the genetic network, respectively. The diagonal sub-matrices M i (i∈G,P 1,P 2) were calculated through normalizing the adjacency matrix of P 1, P 2 and G, and the off-diagonal sub-matrices M ij (i,j∈G,P 1,P 2) were calculated through normalizing the bipartite network connecting P 1, P 2 and G. The normalization was performed following the method in .
Validate the gene prediction for PD
Before using this approach to predict new PD genes, we performed a cross validation analysis to test if the approach can identify the known disease-gene associations. For each of the 15 seed genes, we removed its connections to the PD nodes in HPO and DMN, and excluded it from the seed list. Then we used the rest seeds to rank all the genes. The procedure was repeated for 15 times, the ranks of the 15 genes were examined.
In the second validation experiment, we constructed an independent validation set containing 888 genes as a proxy of the novel PD genes. These genes were obtained through GWAS and downloaded from http://pdgene.org [29, 30]. We retained 669 genes, which have zero overlap with seeds and appear in our scope of gene ranking. We counted the number of validation genes in every 500 genes in our rank from the top to the bottom, and evaluated if the top ranked genes are enriched for the validation genes. We also generated the precision-recall curve to show the performance in ranking the validation genes.
Evaluate the potential of the predicted genes in PD drug discovery
Investigate the ranks of drug target genes
Currently, only a subset of the human genome is druggable . We investigated whether our approach can provide information about the drug target genes for PD. The ranking of two gene sets are tested: the first set contains target genes for FDA-approved PD drugs, and the second set contains target genes for potential PD drugs that have been tested in clinical trials. The drugs extracted from clinical trials are not necessarily successful PD therapies, but have been investigated by researchers for good reasons, thus are considered at least more promising than random drugs. We evaluate the ranking of target genes for both approved and potential PD drugs to approximate the ability of our approach in prioritizing PD drug targets. A total of 42 target genes for 22 FDA-approved PD drugs were extracted from DrugBank , which is a drug-target database. We also obtained 197 genes targeted by 81 PD drugs in http://clinicaltrials.gov (FDA-approved PD drugs were not included). Both sets of target genes have zero overlap with the seeds. We investigated their distributions among all genes.
Analyze pathways associated with top ranked genes
We included all the known PD-associated genes (including the genes identified by GWAS) into the seed list and predicted novel genes for PD. Then we analyzed the pathways associated with top-ranked candidate genes to detect their underlying commonalities. For each of the 1320 canonical pathways extracted from MSigDB , a score was calculated as the number of genes ranked within top 100 divided by the total number of genes in this pathway. The pathways with the highest scores offer insights into the functions of the predicted genes. In addition, we used the same method to analyze the pathways that are associated with the top 100 drug target genes.
Network-based approach allowed prioritizing known PD-associated genes
Result of the leave-one-out cross validation for 15 PD-associated genes from OMIM
The top 500 genes in the ranking contains 99 validation genes (5.3 fold-enrichment comparing with random rankings, p<e −4), and this number decreases rapidly as the rank changes from the top to the bottom. In Fig. 2 a, the precision-recall curve also shows that the top-ranked genes are enriched for the PD genes detected by GWAS. The results demonstrate that the genes prioritized by our approach are likely to be associated with the pathogenesis of PD.
Predicted genes have the translational potential in drug discovery
Pathways underlying the top-ranked genes are associated with PD pathogenesis and provide information of potential PD treatments
Pathways that are enriched for the top ranked candidate genes for PD
Nerve growth factor receptor signaling pathway
Regulation of cellular proliferation and differentiation
Tumor Suppressor that inhibits ribosomal biogenesis
Involving age related diseases like neurodegenerative disease
Nerve growth factor pathway
Mediated signaling of EGFR
Mediate the survival of B cells
Downregulate EGF receptors
Induction of cell cycle arrest and apoptosis
Pathways that are enriched for the top ranked drug target genes
Stimulates cell growth and blocks apoptosis
Regulation of glucose levels
Nerve growth factor pathway
Nerve growth factor receptor signaling pathway
Activation of the AP-1 family of transcription factors
Involving age related diseases
Downregulate EGF receptors
Cellular proliferation and survival
In summary, the pathway analysis detected the commonalities underlying the predicted PD genes. The prioritized pathways not only reflect PD genetic mechanisms, but also may lead to the discovery of targets for novel PD drug therapies.
In this study, we propose a disease gene discovery strategy for PD, which integrates multiple disease phenotypic networks with gene functional relationships and known disease-gene associations. We validated our gene ranking with a cross validation analysis and an independent validation set. We demonstrated that the gene prediction approach provides information for the PD drug targets. The top ranked genes are enriched for targets for both approved and potential PD drugs, and provide unique opportunities for PD drug discovery.
Our approach can be further improved as more human disease phenotype data become available. For example, other kinds of disease phenotype data, such as disease co-morbidities [38, 39] and gene expression profiles, may reflect different aspects of genetic mechanisms and lead to the identification of novel candidate drug targets for PD. In the future, we will develop new approaches to rationally integrate heterogeneous human phenotype data.
In addition, we will systematically predict candidate drugs for PD using the gene prioritization result. Many existing drug discovery approaches compare the genetic and genomic features between diseases and drugs to identify candidate drug therapies . Recent studies show that the phenotypic annotations for mouse gene mutations provide causal relationships between genes and phenotypes, and have great potential in drug repositioning [41, 42]. In our previous work, we designed a drug repositioning approach to combine the human disease genetics with the mouse phenotype data, and predict drugs for a given disease through comparing the phenotype profiles . In the furture, we will incorporate the result obtained in this study into the drug repositioning approach, and improved the approach by combining other data, such as the drug actions and drug structural similarity.
In this study, we evaluated the ranking of genes and drug targets that are known to be associated with PD to approximate the performance of the computational disease-associated gene prediction approach. The ultimate goal of this approach is to identify novel genes and drug targets for PD. In the future, we plan to validate the newly predicted disease-associated genes and candidate drug targets through collaborative biomedical experiments and animal model studies.
DMN, disease manifestation network; FDA, Food and Drug Administration; GWAS, genome-wide association study; HPO, human phenotype ontology; IGF-1, insulin-like growth factor 1; OMIM, Online Mendelian Inheritance in Man; PD, Parkinson’s disease; UMLS, Unified Medical Language System
This manuscript is extended from a previously published abstract (http://0-link.springer.com.brum.beds.ac.uk/book/10.1007\%2F978-3-319-19048-8). YC and RX are funded by the Eunice Kennedy Shriver National Institute Of Child Health & Human Development of the National Institutes of Health under the NIH Director’s New Innovator Award number DP2HD084068. We would like to thank our funding and the reviewers for their invaluable comments and suggestions.
This article has been published as part of BMC Genomics Volume 17 Supplement 5, 2016. Selected articles from the 11th International Symposium on Bioinformatics Research and Applications (ISBRA ’15): genomics. The full contents of the supplement are available onlineer https://0-bmcgenomics-biomedcentral-com.brum.beds.ac.uk/articles/supplements/volume-17-supplement-5.
The publication costs for this article were funded by the corresponding author.
Availability of data and materials
Data is available by contacting Rong Xu at firstname.lastname@example.org.
RX conceived the study. YC designed the methods, performed the experiments and wrote the manuscript. Both authors have participated study discussion and manuscript preparation. Both authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Olanow CW, Stern MB, Sethi K. The scientific and clinical basis for the treatment of Parkinson disease. Neurology. 2009; 72(21 suppl 4):S1–S136.View ArticlePubMedGoogle Scholar
- Jenner P, Olanow CW. The pathogenesis of cell death in Parkinson’s disease. Neurology. 2006; 66(10 suppl 4):S24–S36.View ArticlePubMedGoogle Scholar
- Dawson TM, Dawson VL. Molecular pathways of neurodegeneration in Parkinson’s disease. Science. 2003; 302(5646):819–22.View ArticlePubMedGoogle Scholar
- Shulman JM, De Jager PL, Feany MB. Parkinson’s disease: genetics and pathogenesis. Annu Rev Pathol Mech Dis. 2011; 6:193–222.View ArticleGoogle Scholar
- LeWitt PA. Levodopa for the treatment of Parkinson’s disease. N Engl J Med. 2008; 359(23):2468–76.View ArticlePubMedGoogle Scholar
- Connolly BS, Lang AE. Pharmacological treatment of Parkinson disease: a review. Jama. 2014; 311(16):1670–83.View ArticlePubMedGoogle Scholar
- Gupta A, Dawson VL, Dawson TM. What causes cell death in Parkinson’s disease?Ann Neurol. 2008; 64(S2):S3–S15.View ArticlePubMedPubMed CentralGoogle Scholar
- Brunner HG, Van Driel MA. From syndrome families to functional genomics. Nat Rev Genet. 2004; 5(7):545–51.View ArticlePubMedGoogle Scholar
- Dexter DT, Jenner P. Parkinson disease: from pathology to molecular disease mechanisms. Free Radic Biol Med. 2013; 62:132–44.View ArticlePubMedGoogle Scholar
- Klein C, Schlossmacher MG. The genetics of Parkinson disease: implications for neurological care. Nat Clin Pract Neurol. 2006; 2(3):136–46.View ArticlePubMedGoogle Scholar
- Lage K, Karlberg EO, Strøling ZM, Olason PI, Pedersen AG, Rigina O, et al.A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol. 2007; 25(3):309–16.View ArticlePubMedGoogle Scholar
- Li Y, Patra JC. Genome-wide inferring genephenotype relationship by walking on the heterogeneous network. Bioinformatics. 2010; 26(9):1219–24.View ArticlePubMedGoogle Scholar
- Wu X, Liu Q, Jiang R. Align human interactome with phenome to identify causative genes and networks underlying disease families. Bioinformatics. 2009; 25(1):98–104.View ArticlePubMedGoogle Scholar
- Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol. 2010; 6(1):e1000641.View ArticlePubMedPubMed CentralGoogle Scholar
- Köhler S, Doelken SC, Mungall CJ, Bauer S, Firth HV, Bailleul-Forestier I, et al.The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2013; 42(D1):D966–74.View ArticlePubMedPubMed CentralGoogle Scholar
- Hoehndorf R, Schofield PN, Gkoutos GV. PhenomeNET: a whole-phenome approach to disease gene discovery. Nucleic Acids Res. 2011; 39(18):e119.View ArticlePubMedPubMed CentralGoogle Scholar
- Singleton MV, Guthery SL, Voelkerding KV, Chen K, Kennedy B, Margraf RL, et al.Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. Am J Hum Genet. 2014; 94(4):599–610.View ArticlePubMedPubMed CentralGoogle Scholar
- Köhler S, Schulz MH, Krawitz P, Bauer S, Dlken S, Ott CE, et al.Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am J Hum Genet. 2009; 85(4):457–64.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen Y, Zhang X, Zhang GQ, Xu R. Comparative analysis of a novel disease phenotype network based on clinical manifestations. J Biomed Inform. 2014; 53:113–20.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen Y, Li L, Zhang GQ, Xu R. Phenome-driven disease genetics prediction toward drug discovery. Bioinformatics. 2015; 31(12):i276–83.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang ZY, Zhang HY. Rational drug repositioning by medical genetics. Nat Biotechnol. 2013; 31(12):1080–2.View ArticlePubMedGoogle Scholar
- Sanseau P, Agarwal P, Barnes MR, Pastinen T, Richards JB, Cardon LR, Mooser V. Use of genome-wide association studies for drug repositioning. Nat Biotechnol. 2012; 30(4):317–20.View ArticlePubMedGoogle Scholar
- Nelson MR, Tipney H, Painter JL, et al.The support of human genetic evidence for approved drug indications. Nat Genet. 2015. doi:10.1038/ng.3314.
- Chen Y, Xu R. Network-based gene prediction for plasmodium falciparum Malaria towards genetics-based drug discovery. BMC Genomics. 2015; 16(Suppl 7):S9.View ArticlePubMedPubMed CentralGoogle Scholar
- Robinson PN, Köhler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet. 2008; 83(5):610–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, et al.STRING v9. 1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 2013; 41(D1):D808–D815.View ArticlePubMedGoogle Scholar
- Lesage S, Brice A. Parkinson’s disease: from monogenic forms to genetic susceptibility factors. Hum Mol Genet. 2009; 18(R1):R48–R59.View ArticlePubMedGoogle Scholar
- Lesage S, Brice A. Role of Mendelian genes in “sporadic" Parkinson’s disease. Parkinsonism Relat Disord. 2012; 18:S66–S70.View ArticlePubMedGoogle Scholar
- Nalls MA, Pankratz N, Lill CM, Do CB, Hernandez DG, Saad M, et al.Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson’s disease. Nat Genet. 2014; 46(9):989–93.View ArticlePubMedPubMed CentralGoogle Scholar
- Lill CM, Roehr JT, McQueen MB, Kavvoura FK, Bagade S, Schjeide BMM, et al.Comprehensive research synopsis and systematic meta-analyses in Parkinson’s disease genetics: The PDGene database. PLoS Genet. 2012; 8(3):e1002548.View ArticlePubMedPubMed CentralGoogle Scholar
- Hopkins AL, Groom CR. The druggable genome. Nat Rev Drug Discov. 2002; 1(9):727–30.View ArticlePubMedGoogle Scholar
- Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, et al.DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 2014; 42(D1):D1091—7.View ArticlePubMedGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al.Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005; 102(43):15545–50.View ArticlePubMedPubMed CentralGoogle Scholar
- Mosley RL, Hutter-Saunders JA, Stone DK, Gendelman HE. Inflammation and adaptive immunity in Parkinson’s disease. Cold Spring Harb Perspect Med. 2012; 2(1):a009381.View ArticlePubMedPubMed CentralGoogle Scholar
- Quesada A, Lee BY, Micevych PE. PI3 kinase/Akt activation mediates estrogen and IGF1 nigral DA neuronal neuroprotection against a unilateral rat model of Parkinson’s disease. Dev Neurobiol. 2008; 68(5):632–44.View ArticlePubMedPubMed CentralGoogle Scholar
- Godau J, Herfurth M, Kattner B, Gasser T, Berg D. Increased serum insulin-like growth factor 1 in early idiopathic Parkinson’s disease. J Neurol Neurosurg Psychiatry. 2010; 81(5):536–8.View ArticlePubMedGoogle Scholar
- Picillo M, Erro R, Santangelo G, Pivonello R, Longo K, Pivonello C, et al.Insulin-like growth factor-1 and progression of motor symptoms in early, drug-naïve Parkinson’s disease. J Neurol. 2013; 260(7):1724–30.View ArticlePubMedGoogle Scholar
- Chen Y, Li L, Xu R. Disease Comorbidity network guides the detection of molecular evidence for the link between colorectal cancer and obesity. AMIA Summits Transl Sci Proc. 2015; 2015:201.PubMedPubMed CentralGoogle Scholar
- Chen Y, Xu R. Mining cancer-specific disease comorbidities from a large observational health database. Cancer Informat. 2014; (Suppl. 1):37.Google Scholar
- Dudley JT, Sirota M, Shenoy M, Pai RK, Roedder S, Chiang AP, Morgan AA, Sarwal MM, Pasricha PJ, Butte AJ. Computational repositioning of the anticonvulsant topiramate for inflammatory bowel disease. Sci Transl Med. 2011; 3(96):96ra76.View ArticlePubMedPubMed CentralGoogle Scholar
- Hoehndorf R, Hiebert T, Hardy NW, Schofield PN, Gkoutos GV, Dumontier M. Mouse model phenotypes provide information about human drug targets. Bioinformatics. 2014; 30(5):719–25.View ArticlePubMedGoogle Scholar
- Hoehndorf R, Oellrich A, Rebholz-Schuhmann D, Schofield PN, Gkoutos GV. Linking PharmGKB to phenotype studies and animal models of disease for drug repurposing. In Pac Symp Biocomput. 2012;:388–99.Google Scholar
- Chen Y, Xu R. Combining Human Disease Genetics and Mouse Model Phenotypes towards Drug Repositioning for Parkinson’s disease. AMIA Annual Symposium. 2015; 2015:1851.Google Scholar