- Open Access
AphanoDB: a genomic resource for Aphanomyces pathogens
BMC Genomics volume 8, Article number: 471 (2007)
The Oomycete genus Aphanomyces comprises devastating plant and animal pathogens. However, little is known about the molecular mechanisms underlying pathogenicity of Aphanomyces species. In this study, we report on the development of a public database called AphanoDB which is dedicated to Aphanomyces genomic data. As a first step, a large collection of Expressed Sequence Tags was obtained from the legume pathogen A. euteiches, which was then processed and collected into AphanoDB.
Two cDNA libraries of A. euteiches were created: one from mycelium growing on synthetic medium and one from mycelium grown in contact to root tissues of the model legume Medicago truncatula. From these libraries, 18,684 expressed sequence tags were obtained and assembled into 7,977 unigenes which were compared to public databases for annotation. Queries on AphanoDB allow the users to retrieve information for each unigene including similarity to known protein sequences, protein domains and Gene Ontology classification. Statistical analysis of EST frequency from the two different growth conditions was also added to the database.
AphanoDB is a public database with a user-friendly web interface. The sequence report pages are the main web interface which provides all annotation details for each unigene. These interactive sequence report pages are easily available through text, BLAST, Gene Ontology and expression profile search utilities. AphanoDB is available from URL: http://www.polebio.scsv.ups-tlse.fr/aphano/.
Oomycetes form a phylogenetically distinct group of eukaryotic microorganisms which includes plant and animal pathogens, that cause widespread damages of high economical [1–3] and ecological impacts . Pathogenic oomycete species are found mainly in three orders, the Pythiales, the Peronosporales and the Saprolegniales . From recent studies on the phylogenic relationships within oomycetes, it has been suggested that the ability to infect plants appeared at least twice in the oomycete lineage, first in an ancient lineage which evolved into the Pythiales (including Phytophthora and Pythium) and Peronosporales, and secondly in the Saprolegniales lineage , which includes destructive animal pathogens such as the fish pathogens Saprolegnia parasitica and Aphanomyces piscida, and plant pathogens such as A. euteiches and A. cochlioides. Among members of Oomycetes, Phytophthora is the best studied genus and genomic resources are available for several species (cDNA libraries and/or complete genome sequence) [7–11]. In contrast, few nucleic acid sequences are available from species classified in the Saprolegniales, the main data consisting in a 1510 ESTs collection obtained from the fish pathogen S. parasitica .
Recently, we have developed a pathosystem which consists of the model legume Medicago truncatula and the legume pathogen A. euteiches . Up to now, no chemical compound is able to protect efficiently against A. euteiches and infested fields cannot be used any longer for legume production during many years . In order to get a better knowledge of the A. euteiches-legume interaction and to identify molecular targets for drug design, a genomic approache was used. Two unidirectional cDNA libraries from a A. euteiches strain ATCC201684  were prepared: one from mycelium growing in a liquid medium containing yeast extract and glucose (MYC library), and one from mycelium in contact to M. truncatula root tissues (INT library). The latter situation is a simplified model for growth during pathogenesis. A total of 18,684 expressed sequence tags (9,224 from MYC library and 9,460 from INT library) were submitted to the EBI databank for accession number assignation and were assembled into 7,977 unigenes. Here we present a database named AphanoDB in which all the data were structured. Users can retrieve information using text searches or BLAST analyses. AphanoDB is currently the most extensive resource of Aphanomyces sequences and related annotations.
Construction and content
Preparation of cDNA libraries and sequencing
The MYC library (saprophytic library) was made with total RNA isolated from mycelium after 5, 7 and 9 days of growth in liquid YG medium (2.5% Yeast extract, 5% Glucose, w/v) at 23°C in the dark. Mycelia were frozen in liquid nitrogen and total RNA was extracted. RNAs of each sample were mixed in equal amounts. mRNAs were purified using an Oligotex mRNA purification kit (Qiagen, Valencia, CA, USA) according to the manufacturer's instructions. A unidirectionnal library was constructed in pSport1 plasmid using the Superscript Plasmid System for cDNA Synthesis and Cloning (Invitrogen, USA). Plasmid ligations were transferred by electroporation into E. coli DH5α cells following the manufacturer's protocol (Invitrogen Life Technologies, CA, USA). A library of approximately 5 × 105 colony forming units was obtained.
The INT library (interaction library) was made with total mRNAs purified from mycelium grown for 1 or 2 days on M. truncatula roots. Roots of two week-old plants were laid onto a 2 day-old mycelium, and culture continued for 1 and 2 days. Before the mycelium harvesting, plants were gently removed from the Petri dish to avoid any contamination of A. euteiches RNA with plant RNA.A. euteiches mRNA extracts were pooled and cDNA generated as described for the MYC library. A library of approximately 5 × 105 colony forming units was obtained. About 10,000 clones from each library were 5' end sequenced using T7 primer on ABI3730xl DNA Sequencers.
EST quality and assembling
The two libraries were processed together. All reads were obtained using the Phred program . Only reads with a Phred value over 20 on 80% of the sequences and with a length of over 100 bp were selected. MultiFASTA sequence and quality files were generated and cleaned by vector and adaptor trimming using crossmatch (Figure 1). After these steps 18,684 high-quality sequences were obtained corresponding to 93% of the initial sequence set. These sequences were assembled using the CAP3 program . Minimum overlap length was set to 100 bp, minimum identity percent to 97% and maximum gap length to 30 bp (-o 100 -p 97 -f 30). ESTs from the two libraries were assembled to obtain 7,977 unigenes composed of 2,843 contigs and 5,134 singletons (Table 1). Consensus sequences were renamed with a unique identifier with the prefix 'Ae' for the species, 3 digits for the number of ESTs included in the contig, the 2 letters 'AL' for the strain and 5 digits as individual number.
EST analysis and functional annotation
Assignment of putative functions was performed by running the BLASTX algorithm (release 2.2.14) , against a local NCBI non-redundant (nr) protein database (5-17-2007 Version). 61% of the sequences showed similarity to a protein sequence with an E-value ≤ 1e-10. To estimate the level of contamination of A. euteiches ESTs with M. truncatula cDNA sequences which might occur in the INT library, the unigene sequences were compared to 270,000 M. truncatula ESTs deposited in the GenBank database using the BLASTN algorithm. Only 11 unigenes showed a high similarity (E value < 1e-100) to M. truncatula ESTs and only two unigenes were composed of ESTs from the INT library. However, these two sequences displayed a higher similarity to Phytophthora sequences than to M. truncatula sequences. From this analysis, it can be concluded that contamination of the INT library with M. truncatula cDNAs is very low if any. Sequences were compared to proteome data from seven different fully sequenced organisms using BLASTX algorithm, since parts of the proteome of these organisms are not present in the nr database. To facilitate comparative analyses between oomycete species, A. euteiches sequences were compared to the P. sojae and P. ramorum proteomes . Since it has been shown that oomycetes are phylogenetically related to diatoms, A. euteiches sequences were compared to the Thalassiosira pseudonana proteome . In order to find genes putatively involved in plant pathogenesis, A. euteiches sequences were compared to the proteome of the fungal pathogen Nectria haemetococca.Toxoplasma gondi  and Plasmodium falciparum  proteomes were selected since it has been suggested recently that apicomplexan parasites and oomycetes share common infection strategies . Finally, the proteome of Arabidopsis thaliana was also added to analysis.
Domain searches using InterProscan program (release 4.2)  were performed locally with HMM searches against Pfam protein database (16.0) . 45% of sequences showed a known Pfam domain with an E-value ≤ 1e-5 (Table 1).
To classify the sequences according to the Gene Ontology classification scheme , Pfam domains with InterPro accession number were linked to GO molecular function, biological process and cellular component terms using the interpro2go file . Finally, 42% of the sequences were assigned to a GO molecular function term, 43% to a biological process and 24% to a cellular component (Table 2).
Gene Ontology SQL database format was downloaded from the GO web site  and added to AphanoDB. Links between GO terms and the Enzyme Nomenclature terms  were established using ec2go mapping , and KEGG metabolic pathway map links  were also added to the database.
Repeat sequences were detected using RepeatMasker  with default parameters in order to identify microsatellite markers and result outputs were added to the database.
To estimate differences in gene expression levels between the saprophytic and the pathogenic growth conditions, a statistical analysis was performed based on EST frequency in each library. We used the test described by Susko et al.  to calculate the p-value and false discovery rate control (FDR) for multiple test correction  (false positive rate is controlled at less than α = 0.05).
Utility and discussion
AphanoDB provides a complete summary sheet containing all annotation results for each predicted unigene (Figure 2). The BLASTX results against the nr database, HMM search results against Pfam database and the Gene Ontology assignments are displayed on this summary sheet. BLASTX alignment details are also provided and will be updated yearly. The summary sheet also contains the sequence of the unigene, the link to the ESTs alignment, to the Ace file and another link to a multiFASTA file of the contig components.
Users can query the database for a given sequence by providing its accession number, EST name or gene ID. Searches on the Gene Ontology annotation are possible with graphical bars representing GO subfamilies (Figure 2A). For the GO catalytic activity subfamilies, EC links and KEGG metabolic pathway map links are provided. Function searches are allowed by queries by InterPro or Pfam domain names or accession numbers. BLASTX results stored in the database can be queried using organism name or putative function (Figure 2B–C). Queries on repeat sequence types and sizes are also available as well as on expression profile in a specific growth condition. For expression profiles, the output shows only significant results with a P-value lower than the Benjamini and Hochberg cut-off . Users can submit their own sequences for BLAST searches against the A. euteiches sequences using BLASTN, TBLASTN and TBLASTX. The BLAST output page displays one summary sheet for each hit on the A. euteiches sequences.
Utilities and extensions
AphanoDB provides molecular data about A. euteiches transcripts. The database contains 18,684 high quality sequences and allows direct applications for functional and comparative genomic approaches. AphanoDB is constructed in such a way that it can absorb a large number of additional sequences from other oomycete species, such as S. parasitica, for which large scale cDNA sequencing projects are under way.
AphanoDB represents a major contribution to assist genomic studies on Oomycetes and other related organisms such as diatoms and brown algae. Addition of new sequences from other Aphanomyces species and other Saprolegniales is planned in the near future. AphanoDB will facilitate gene prediction and annotation for the future whole genome sequencing of Saprolegniales species. AphanoDB contains cleaned, assembled and annotated ESTs which will serve the oomycete research community. The database provides comprehensive tools for comparative approaches that might lead for example to leading to the identification of pathogenicity factors.
Availability and requirements
AphanoDB is freely available to academic and non academic users at http://www.polebio.scsv.ups-tlse.fr/aphano/. A browser supporting frames must be used (Firefox, Netscape 2.0, Internet Explorer 3.0 or higher). cDNA clones can be obtained upon request at http://cnrgv.toulouse.inra.fr/ENG/. All ESTs can be downloaded from AphanoDB. They have been also deposited in dbEST (accession numbers CU357053 to CU361296).
expressed sequence tag
Kyoto Encyclopedia of Genes and Genomes
Hidden Markov Model
Fry W, Goodwin S: Re-emergence of potato and tomato late blight in the United States. Plant Dis. 1997, 81: 1349-1357. 10.1094/PDIS.19188.8.131.529.
Fry W, Smart C: The return of Phytophthora infestans, a potato pathogen that just won't quit. Potato Res. 1999, 42: 279-282. 10.1007/BF02357858.
van West P: Saprolegnia parasitica, an oomycete pathogen with a fishy appetite: new challenges for an old problem. Mycologist. 2006, 20: 99-104. 10.1016/j.mycol.2006.06.004.
Jönsson-Belyazio U, Rosengren U: Can Phytophthora quercina have a negative impact on mature pedunculate oaks under field conditions?. Ann For Sci. 2006, 63: 661-672. 10.1051/forest:2006047.
Dick M: In Straminipilous Fungi. 2001, KA
Kamoun S: Nonhost resistance to Phytophthora: novel prospects for a classical problem. Curr Opin Plant Biol. 2001, 4: 295-300. 10.1016/S1369-5266(00)00176-X.
Tyler BM, Tripathy S, Zhang X, Dehal P, Jiang RH, Aerts A, Arredondo FD, Baxter L, Bensasson D, Beynon JL, Chapman J, Damasceno CM, Dorrance AE, Dou D, Dickerman AW, Dubchak IL, Garbelotto M, Gijzen M, Gordon SG, Govers F, Grunwald NJ, Huang W, Ivors KL, Jones RW, Kamoun S, Krampis K, Lamour KH, Lee MK, McDonald WH, Medina M, Meijer HJ, Nordberg EK, Maclean DJ, Ospina-Giraldo MD, Morris PF, Phuntumart V, Putnam NH, Rash S, Rose JK, Sakihama Y, Salamov AA, Savidor A, Scheuring CF, Smith BM, Sobral BW, Terry A, Torto-Alalibo TA, Win J, Xu Z, Zhang H, Grigoriev IV, Rokhsar DS, Boore JL: Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis. Science. 2006, 313: 1261-1266. 10.1126/science.1128796.
Govers F, Gijzen M: Phytophthora genomics: the plant destroyers' genome decoded. Mol Plant Microbe Interact. 2006, 19: 1295-1301. 10.1094/MPMI-19-1295.
Gajendran K, Gonzales MD, Farmer A, Archuleta E, Win J, Waugh ME, Kamoun S: Phytophthora functional genomics database (PFGD): functional genomics of Phytophthora-plant interactions. Nucleic Acids Res. 2006, 34: D465-70. 10.1093/nar/gkj119.
Torto-Alalibo TA, Tripathy S, Smith BM, Arredondo FD, Zhou L, Li H, Chibucos MC, Qutob D, Gijzen M, Mao C, Sobral BW, Waugh ME, Mitchell TK, Dean RA, Tyler BM: Expressed sequence tags from Phytophthora sojae reveal genes specific to development and infection. Mol Plant Microbe Interact. 2007, 20: 781-793. 10.1094/MPMI-20-7-0781.
Randall TA, Dwyer RA, Huitema E, Beyer K, Cvitanich C, Kelkar H, Fong AM, Gates K, Roberts S, Yatzkan E, Gaffney T, Law M, Testa A, Torto-Alalibo T, Zhang M, Zheng L, Mueller E, Windass J, Binder A, Birch PR, Gisi U, Govers F, Gow NA, Mauch F, van West P, Waugh ME, Yu J, Boller T, Kamoun S, Lam ST, Judelson HS: Large-scale gene discovery in the oomycete Phytophthora infestans reveals likely components of phytopathogenicity shared with true fungi. Mol Plant Microbe Interact. 2005, 18: 229-243. 10.1094/MPMI-18-0229.
Torto-Alalibo T, Tian M, Gajendran K, Waugh ME, van West P, Kamoun S: Expressed sequence tags from the oomycete fish pathogen Saprolegnia parasitica reveal putative virulence factors. BMC Microbiol. 2005, 5: 46-58. 10.1186/1471-2180-5-46.
Gaulin E, Jacquet C, Bottin A, Dumas B: Root rot disease of legumes caused by Aphanomyces euteiches. Molecular Plant Pathol. 2007, 8: 539-548. 10.1111/j.1364-3703.2007.00413.x.
Malvick D, Grau C: Characteristics and frequency of Aphanomyces euteiches races 1 and 2 associated with alfalfa in the Midwestern United States. Plant dis. 2001, 85: 740-744. 10.1094/PDIS.2001.85.7.740.
Petersen A: Use of polyclonal antibodies to detect oospores of Aphanomyces. Mycol Res. 1996, 100: 495-499.
Ewing B, Green P: Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 1998, 8: 186-194.
Huang X, Madan A: CAP3: a DNA sequence assembly program. Genome Res. 1999, 9: 868-877. 10.1101/gr.9.9.868.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.
Armbrust EV, Berges JA, Bowler C, Green BR, Martinez D, Putnam NH, Zhou S, Allen AE, Apt KE, Bechner M, Brzezinski MA, Chaal BK, Chiovitti A, Davis AK, Demarest MS, Detter JC, Glavina T, Goodstein D, Hadi MZ, Hellsten U, Hildebrand M, Jenkins BD, Jurka J, Kapitonov VV, Kroger N, Lau WW, Lane TW, Larimer FW, Lippmeier JC, Lucas S, Medina M, Montsant A, Obornik M, Parker MS, Palenik B, Pazour GJ, Richardson PM, Rynearson TA, Saito MA, Schwartz DC, Thamatrakoln K, Valentin K, Vardi A, Wilkerson FP, Rokhsar DS: The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science. 2004, 306: 79-86. 10.1126/science.1101156.
Kissinger JC, Gajria B, Li L, Paulsen IT, Roos DS: ToxoDB: accessing the Toxoplasma gondii genome. Nucleic Acids Res. 2003, 31: 234-236. 10.1093/nar/gkg072.
Bahl A, Brunk B, Crabtree J, Fraunholz MJ, Gajria B, Grant GR, Ginsburg H, Gupta D, Kissinger JC, Labo P, Li L, Mailman MD, Milgram AJ, Pearson DS, Roos DS, Schug J, Stoeckert CJ, Whetzel P: PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data. Nucleic Acids Res. 2003, 31: 212-215. 10.1093/nar/gkg081.
Bhattacharjee S, Hiller NL, Liolios K, Win J, Kanneganti TD, Young C, Kamoun S, Haldar K: The malarial host-targeting signal is conserved in the Irish potato famine pathogen. PLoS Pathog. 2006, 2: e50-10.1371/journal.ppat.0020050.
Zdobnov EM, Apweiler R: InterProScan - an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001, 17: 847-848. 10.1093/bioinformatics/17.9.847.
Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR: The Pfam protein families database. Nucleic Acids Res. 2004, 32: D138-41. 10.1093/nar/gkh121.
Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A, Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32: D258-61. 10.1093/nar/gkh066.
interpro2go file. [http://www.geneontology.org/external2go/interpro2go]
Gene Ontology SQL database. [http://www.geneontology.org/GO.downloads.shtml]
Nomenclature committee of the international union of biochemistry and molecular biology (NC-IUBMB), Enzyme Supplement 5 (1999). Eur J Biochem. 1999, 264: 610-650. 10.1046/j.1432-1327.1999.nomen.x.
ec2go file. [http://www.geneontology.org/external2go/ec2go]
Kanehisa M: The KEGG database. Novartis Found Symp. 2002, 247: 91-101; discussion 101-3, 119-28, 244-52.
Bedell JA, Korf I, Gish W: MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics. 2000, 16: 1040-1041. 10.1093/bioinformatics/16.11.1040.
Susko E, Roger AJ: Estimating and comparing the rates of gene discovery and expressed sequence tag (EST) frequencies in EST surveys. Bioinformatics. 2004, 20: 2279-2287. 10.1093/bioinformatics/bth239.
Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc. 1995, 57: 289-300.
We thank Arnaud Bottin and Martina Rickauer (SP2 INP-ENSAT, Toulouse) for helpful discussions and critically reading the manuscript. We also thank Didier Laborie, David Allouche, Jean-Marc Larre and all members the Bioionformatic Platform of the Genopole (Toulouse, France) for the computing technical support. Part of this work was supported by the FP6 Grain Legume Integrated Project. MAM received a pre-doctoral scholarship from the CNRS. Sequencing was supported by CNRG.
MAM designed the database and the web interface and wrote the pipeline programs. EG constructed the Aphanomyces euteiches cDNA libraries and coordinated the sequencing project. CM and HSC cooperated in structuring database and designed the pipeline draft. AC and PW managed the sequencing of the cDNA clones. BD led the AphanoDB project, helped to draft the manuscript and participated to the design of the database. All authors tested the database and revised the manuscript.
About this article
Cite this article
Madoui, M., Gaulin, E., Mathé, C. et al. AphanoDB: a genomic resource for Aphanomyces pathogens. BMC Genomics 8, 471 (2007) doi:10.1186/1471-2164-8-471
- Gene Ontology
- Pfam Domain
- Gene Ontology Classification
- Summary Sheet
- False Discovery Rate Control