- Research article
- Open Access
Cestode strobilation: prediction of developmental genes and pathways
BMC Genomics volume 21, Article number: 487 (2020)
Cestoda is a class of endoparasitic worms in the flatworm phylum (Platyhelminthes). During the course of their evolution cestodes have evolved some interesting aspects, such as their increased reproductive capacity. In this sense, they have serial repetition of their reproductive organs in the adult stage, which is often associated with external segmentation in a developmental process called strobilation. However, the molecular basis of strobilation is poorly understood. To assess this issue, an evolutionary comparative study among strobilated and non-strobilated flatworm species was conducted to identify genes and proteins related to the strobilation process.
We compared the genomic content of 10 parasitic platyhelminth species; five from cestode species, representing strobilated parasitic platyhelminths, and five from trematode species, representing non-strobilated parasitic platyhelminths. This dataset was used to identify 1813 genes with orthologues that are present in all cestode (strobilated) species, but absent from at least one trematode (non-strobilated) species. Development-related genes, along with genes of unknown function (UF), were then selected based on their transcriptional profiles, resulting in a total of 34 genes that were differentially expressed between the larval (pre-strobilation) and adult (strobilated) stages in at least one cestode species. These 34 genes were then assumed to be strobilation related; they included 12 encoding proteins of known function, with 6 related to the Wnt, TGF-β/BMP, or G-protein coupled receptor signaling pathways; and 22 encoding UF proteins. In order to assign function to at least some of the UF genes/proteins, a global gene co-expression analysis was performed for the cestode species Echinococcus multilocularis. This resulted in eight UF genes/proteins being predicted as related to developmental, reproductive, vesicle transport, or signaling processes.
Overall, the described in silico data provided evidence of the involvement of 34 genes/proteins and at least 3 developmental pathways in the cestode strobilation process. These results highlight on the molecular mechanisms and evolution of the cestode strobilation process, and point to several interesting proteins as potential developmental markers and/or targets for the development of novel antihelminthic drugs.
The phylum Platyhelminthes comprise a diverse array of species that occur in all continental masses, seas, rivers, and lakes. Platyhelminths (flatworms) are dorsoventrally flattened bilaterian acoelomates that lack an anus, possess a low level of cephalization, and are usually hermaphroditic . They are divided into four main classes: Turbellaria (free-living planarians), Monogenea (mostly aquatic ectoparasites), Trematoda (flukes), and Cestoda (tapeworms) .
Parasitic flatworms (monogeneans, flukes, and tapeworms) form a monophyletic group known as Neodermata , which constitutes one of the three largest groups of metazoan parasites that infect vertebrates (the others being nematodes and arthropods). Flukes and tapeworms form a derivate monophyletic group of endoparasitic species, some of which are of great medical and veterinary importance. For instance, the World Health Organization’s list of neglected tropical diseases (http://www.who.int/neglected_diseases/diseases/en/) includes several caused by endoparasitic flatworms, namely foodborne trematodiases (caused by Clonorchis spp. and Fasciola spp., among other flukes), schistosomiases (caused by Schistosoma spp.), echinococcoses (caused by Echinococcus spp.), and taeniases/cysticercosis (caused by Taenia spp.).
Tapeworms are obligate endoparasites of vertebrates that display complex life cycles in which morphologically and physiologically distinct forms alternate, adapted to the survival and development of different intermediate host species [4, 5]. The Cestoda class is divided into two subclasses: Eucestoda and Cestodaria. The Eucestoda subclass (‘true’ tapeworms) includes Echinococcus spp. and Taenia spp., which are the main species of medical and veterinary interest. Eucestodes have increased fertility due to metamerism (the serial repetition of body structures, or metameres) . In eucestodes, metamerism is represented by the internal serial repetition of their reproductive organs, called proglottization [4, 6]. In the majority of eucestode evolutionary lineages, proglottization is associated with the external delimitation (segmentation) of proglottids, in a developmental process called strobilation [4, 5]. Eucestode strobilation occurs during the transition from the larval to adult stage and involves the repetitive generation of new proglottids in the base of the head (scolex), in the so-called neck region. Strobilation persists during the whole life of the adult worm, with each proglottid moving toward the posterior end as a new one is generated in the neck region. Strobilation allows adult cestodes to have larger numbers of hermaphroditic sexual organ sets, promoting both cross and self-fertilization and the frequency of egg-release by the progressive excision of gravid proglottids [4, 6].
In cestode evolution, the eucestodes constitute the most recently evolved subclass, and includes orders with different degrees of proglottization/strobilation . The most ancestral eucestodes (those from the order Caryophyllidea) do not undergo proglottization and are non-segmented, similar to the other ancestral cestodes (i.e. those from the Cestodaria sublass). On the other hand, eucestodes of the order Spathebothriidea have proglottization without body segmentation, whereas others from more recently evolved orders (e.g. members of the order Cyclophyllidea, including the most relevant cestodes from an epidemiological point of view) undergo full strobilation (i.e. proglottization along external segmentation).
Thus, the evolution of strobilated tapeworms from non-segmented ancestors could be explained by two hypotheses: the initial loss of segmentation from an ancestral segmented lophotrochozoan, followed by the re-emergence of this process in more recent eucestode lineages in the form of proglottization/strobilation; or the independent evolution of proglottization/strobilation within the eucestode lineage, which gave rise to the present proglottized or fully strobilated orders. Therefore, cestodes are interesting subjects for evolutionary developmental studies , especially for those aiming to elucidate the evolutionary origins of developmental novelties related to strobilation. To achieve this, it is important to comparatively analyze cestode genomes and identify candidate genes related to these developmental processes.
So far little is known about strobilation and other developmental processes in cestodes at the molecular level, despite the relevance of such knowledge for both basic evolutionary developmental studies  and the identification of targets for new alternative drugs against cestode parasites [8,9,10,11]. To better understand cestode molecular biology, several species have been targeted by “omics” studies, including genomic, transcriptomic, and proteomic surveys. Echinococcus granulosus, Echinococcus multilocularis, Hymenolepis microstoma, and Taenia solium were the first cestodes to have their genomes completely sequenced . Later, the 50 Helminth Genomes Project provided the draft genome sequence of another 14 cestode species, along with those of many other helminths . Furthermore, transcriptomic and proteomic data are available for different life-cycle stages of cestode species, such as E. granulosus, E. multilocularis, H. microstoma, and Mesocestoides corti [12, 14,15,16,17,18]. In some of these studies, the differential expression of transcripts and proteins between larval (non-strobilated) and adult (strobilated) forms has been assessed. For instance, tetrathyridia (larvae) and adult segmented worms of the M. corti species were compared with regard to their miRNA , mRNA [15, 16] and protein  repertoires, providing some clues about the gene products differentially expressed during the transition between these stages. The initial steps of strobilation were addressed by two proteomic studies; one comparing M. corti bona fide tetrathyridia with tetrathyridia after 24 h of strobilation induction  and the other identifying proteins newly synthesized in E. granulosus pre-adult forms (protoscoleces) upon strobilation induction .
Here, a data mining approach, integrating genomic and transcriptomic data, was carried out to identify cestode developmental genes and pathways, and to demonstrate the molecular mechanisms involved in the strobilation process. Eighteen species from the Protostomia clade of Bilateria metazoans were assessed; 10 strobilated and non-strobilated species of flatworms and 8 outgroup species. Their genome sequences and transcriptional profiles were compared to identify tapeworm developmental genes associated with strobilation and, among these genes, those that had differential expression in the cestode pre-strobilated and strobilated stages. Genes associated with the strobilation process had their evolutionary histories investigated through phylogenetic and positive selection analyses. Moreover, functional enrichment provided further information on annotated gene products, and co-expression network analyses provided further information on the products of hypothetical genes. Overall, 34 proteins associated with strobilation were identified, providing evidence of the involvement of both conserved and novel cellular pathways in cestode strobilation.
Phylogenomic analyses of strobilated and non-strobilated Platyhelminthes cestode species
A phylogenomic analysis was carried out with 10 neodermatan genomes, comprising 5 tapeworm species to represent strobilated platyhelmiths (E. granulosus, E. multilocularis, H. microstoma, M. corti, and T. solium); and by 5 fluke species to represent non-strobilated platyhelmithes (Clonorchis sinensis, Schistosoma haematobium, Schistosoma japonicum, Schistosoma mansoni, and Opisthorchis viverrini). An outgroup set of 8 genomes was used in the analysis, comprising 6 genomes from nematodes to represent non-segmented helminths (Caenorhabditis elegans, Globodera pallida, Haemonchus contortus, Onchocerca volvulus, Strongyloides ratti, and Trichuris muris); 1 genome from an annelid to represent a segmented protostome (Helobdella robusta); and 1 genome from a mollusk to represent a non-segmented protostome (Lottia gigantea).
Overall, 11,300 orthogroups of deduced protein sequences were identified, of which 285 have orthologous genes in all 18 analyzed species. These 285 protein sets were aligned, and the alignments were concatenated in a supermatrix for phylogenomic inference via Bayesian analysis. In the resulting tree (Fig. 1), two endoparasitic flatworm monophyletic groups were highly supported, with posterior probability of 100, one corresponding to flukes (trematodes) and the other to tapeworms (cestodes). The platyhelminthes were clearly divided into two clades, one with external body segmentation (with full strobilation) and the other with only internal segmentation (proglottization). Regarding protostome relationships, the tree supports the monophyly of the Ecdysozoa, Lophotrochozoa, Platyhelminthes, Cestoda, and Trematoda clades.
Identification of strobilation-related proteins
To identify proteins related to the strobilation process, orthogroups were selected based on their presence in tapeworm species or absence from fluke species (Fig. 2). From the 11,300 identified orthogroups of deduced protein sequences, 6964 (61.63%) were found in at least one tapeworm species, whereas 6985 (61.81%) were found in at least one fluke species (Fig. 2a). From this subtotal, a set of 3365 orthogroups were shared by all 5 tapeworm species, and a set of 2809 orthogroups were shared by all 5 fluke species (supplementary Figure S1). It was assumed that proteins essential for tapeworm development would be found in all strobilated species but may be absent from non-strobilated species. Based on this criterion, 1813 tapeworm strobilation-related orthogroups were initially selected (Fig. 2a; supplementary Table S1). From the 1813 selected orthogroups, 910 were found in all tapeworms that were absent from 1 to 4 fluke species, whereas 903 orthogroups were found in all tapeworms that were absent from all 5 fluke species.
As tapeworm strobilation is a developmental process, we performed a functional enrichment of the initial set of 1813 tapeworm strobilation-related orthogroups. The functional assignment of these orthogroups is shown in supplementary Table S1; the biological process assignment is summarized in Fig. 2b, and the assigned molecular functions and cellular components are summarized in supplementary Figure S2. Overall, 152 orthogroups were assigned to developmental processes (highlighted in yellow in supplementary Table S1) and then selected for further analyses. A total of 304 orthogroups of UF proteins (highlighted in blue in supplementary Table S1) were also selected, as at least some of these may also be related to developmental processes.
Furthermore, considering that the strobilation process occurs only in the adult tapeworm and not in the larval stage(s), we used available transcriptomic data of three tapeworm species (E. multilocularis, H. microstoma, and M. corti) to identify proteins from the selected 326 orthogroups (152 development-related orthogroups and 304 UF proteins) whose genes have differential expression in the larval (pre-strobilation) and adult (strobilated) stages. The differentially expressed orthologous genes in the larval and adult stages of the non strobilated flukes S. haematobium  and S. mansoni  were excluded to avoid genes not related to strobilation. This resulted in 12 development-related and 22 UF proteins (from now on identified as UF 1–22) being selected as strobilation-related proteins (Table 1, supplementary Table S2).
The selected set of 12 proteins previously associated with development in other organisms was mapped into cellular pathways based on KEGG data (Fig. 3). Among these proteins, Groucho, MARK2, and TFC/LCF were mapped as components of the Wnt pathway; BMP2 and Smad4 were mapped as components of the TGF-β/BMP pathway; and NPR1 was mapped as component of the G-protein coupled receptor signaling pathway. This provided evidence of the involvement of some well-known developmental pathways in cestode strobilation.
To evaluate and confirm the homology of the proteins in the selected orthogroups, we performed functional domain predictions and comparisons (Fig. 4, supplementary Table S3). In all cases, the proteins within the orthogroup showed the same profile of predicted domains, further confirming their orthologies. In the set of 12 developmental proteins, several functional domains were identified, including a signal peptide and homeobox (Hox B4a and LHX1), RNA recognition motif, and PDZ domains. In the set of 22 UF proteins, 8 had transmembrane domains, two of them in association with another functional domain (a family A G protein-coupled receptor-like in UF 3 and an uncharacterized PTHR12242 domain in UF 15). Two calcium dependent phosphotriesterase domains were also identified in another UF protein (UF 1). No functional domains were found in the remaining 13 UF proteins.
Evolutionary analyses of strobilation-related proteins
A comprehensive search for orthologs was carried out for all the identified strobilation-related proteins, defining 34 orthogroups. The resulting data is summarized in the supplementary Figure S3. It was verified that half (17) of these proteins (MAGI2, Mark2, RBMS, Ser:Thr kinase, TCF/LCF, UF 3, UF 5–14, and UF 22) had orthologs present only among cestodes. Only eight of the strobilation-related proteins had orthologs in non-flatworm lophotrocozoans, namely BMP-2, GAK, Groucho, Hox B4a, LHX1, NPR1, SMAD4, and UF 15. The remaining nine assessed targets (UF 1–2, UF 4, and UF 16–21) showed orthologs only among flatworms. It is interesting to note that considering this last set of 17 proteins were not cestode-exclusive, homologies among cestode orthologues (47.01–81.24%) were considerably higher than their homologies to orthologs from the other assessed taxa (15.82–48.45%) (supplementary Table S4).
Phylogenetic analyses were then performed to describe the evolutionary history of the 34 putative strobilation-related orthogroups (supplementary Table S5). The resulting trees (supplementary Figure S4–S37) agreed with the monophyly of tapeworms previously established by the phylogenomic analysis. The cyclophyllidean species from the Hymenolepididae and Mesocestoididae families (H. microstoma and M. corti, respectively) alternated as the most basal, whereas members of the Taeniidae family were more derived.
To address whether the evolution of these putative strobilation-related orthogroups could be under selective pressure, we performed searches to detect codons under positive selection. All codon sequences were aligned and the best phylogenetic tree for each orthogroup was selected (supplementary Table S6) for use in the positive selection analysis. The results suggested that the assessed orthogroups have not been under positive selection, except for a single site in the UF 16 orthogroup (supplementary Table S7).
Functional analyses of strobilation-related UF proteins
Most of the proteins selected as being strobilation-related do not have any previously described biological function (UF 1–22). Therefore, to help determine their functional roles in tapeworm development, we performed a gene co-expression network analysis using the available transcriptomic data for the E. multilocularis pre-adult stage. Data from a pre-adult stage was used because of the lack of RNA-seq data from adult cestodes, in the amount and quality, required to allow the generation of a network based on gene co-expression.
Network construction was based on eight E. multilocularis RNA-seq samples, for which no sample outliers were identified (supplementary Figure S38). The resulting network consisted of 2957 nodes and 2,140,516 connections. Gene co-expressions were detected for 15 proteins in the set of 34 selected as strobilation-related, namely Hox B4a, LHX1, MAGI2, and NPR1 and UFs 2, 3, 5, 6, 10, 11, 13, 15, 16, 19, and 20. Only eight of them, namely UFs 2, 6, 10, 11, 15, 16, 19, and 20, were considered hubs (supplementary Table S8) and selected for further analyses. In Fig. 5, the modules to which the corresponding genes of these hub proteins belong to are indicated. With the aim of enhancing the information robustness of these selected network modules, we used the hub genes from each selected module as input for protein-protein interaction (PPI) analyses. Assuming that interacting proteins of a given module are usually involved in the same biological functions , we performed a functional enrichment for each of the eight selected modules (supplementary Figure S39 and supplementary Tables S9–15) to evaluate possible functions or pathways involving the strobilation-related UF proteins (Fig. 5). Based on the functional enrichment, the UFs 2, 6, 10, 11, 15, 16, 19, and 20 hub proteins were assigned to Wnt or G protein-coupled signaling pathways, and/or to biological functions such as vesicle-mediated transport, inductive cell migration, cell adhesion, apoptotic processes, and cellular response to interleukin-1, linking these pathways and functions to tapeworm strobilation.
Cestode developmental processes, including those involved in the transition from larvae to adult segmented worms (strobilation), are still poorly understood and a truly relevant theme in flatworm biology. Some previous proteomic and transcriptomic studies have started to unravel at least some of the molecular events underpinning cestode strobilation [12, 14,15,16,17,18,19], but we are still far from understanding all the mechanisms involved in this complex developmental process. In this study, we generated novel and complementary information regarding cestode strobilation using an integrative data mining approach that compared the genomic data of five tapeworm species and five fluke species, and compared the transcriptomic data of different developmental stages of three tapeworms and one fluke.
The draft genomes selected for analyses were chosen because of their advanced stages of sequencing, assembly, and annotation, and allowed the correct identification of genes/proteins and a precise definition of orthogroups. With that, a dependable assessment of the presence/absence of orthologues in the genomes of closely related tapeworm and fluke species was achieved. Overall, 34 proteins (12 developmental proteins and 22 UF proteins) were selected based on the following criteria (i) presence in all fully strobilated species; (ii) absence in at least one non-strobilated species; (iii) annotation as a developmental gene/protein or with unknown function; and (iv) differential expression in strobilated developmental stages.
The use of such an approach and criteria resulted in the identification of a set of strobilation-related genes that can be regarded as complementary to other sets of genes previously associated with this developmental process based solely on their differential expression at the transcriptional level [12, 14,15,16] and/or at the protein level [17,18,19]. As each of these in silico or wet lab approaches rely on different technologies and present different advantages and limitations, they complement each other in the assessment of complex biological processes, such as those involved in development or disease [23,24,25]. Therefore, the ongoing and progressive assessment of cestode strobilation by different in silico and wet lab approaches is expected to generate a more comprehensive picture of the molecules, mechanisms, and developmental pathways underlying this complex biological process.
An interesting feature of the novel set of 34 proteins identified as being related to strobilation was that half (17) of them were exclusive from tapeworms. Moreover, the remaining 17 proteins, although not exclusive, were considerably more conserved among cestodes (with identities 20.2–44.6% higher) than with their orthologs from other taxa, suggesting some degree of specialization of these proteins within the Cestoda class. As expected, these “non-exclusive” proteins are monophyletic for tapeworms, and nine of them had orthologs identified only for flatworm species, including tapeworms and flukes. Therefore, within the set of 34 proteins selected as being strobilation-related, 17 are assumed to be related to cestode-exclusive functions, whereas 9 are assumed to be associated with functions exclusive to the phylum Platyhelminthes. Further studies are necessary to determine the possible specialized functions of these proteins in tapeworm or flatworm biology.
In this set of 34 strobilation-related proteins, only 8 had orthologs found in non-Platyhelminth lophotrochozoans. The other 26 proteins were conserved only among platyhelminths (with 17 of them being conserved only among cestodes). This suggests that any ancestral lophotrochozoan segmentation mechanism  may have changed or gained complexity with the addition of gene/proteins during platyhelminth or cestode evolution. The phylogenetic analyses performed for these 34 strobilation-related proteins showed that their evolutionary history agrees with the currently accepted monophyletic origin of tapeworms .
It was interesting to evaluate the synonymous versus non-synonymous substitution rates regarding the set of strobilation-related proteins, as low variability is assumed for developmental genes due to the strong functional constraints they are submitted to . According to the positive selection analyses, the assessed sequences did not show statistically significant high ω values, indicating that they are probably under functional constraints. This is consistent with what is expected for genes involved in development and other vital processes [28, 29] (supplementary Table S7).
We also investigated whether any of the strobilation-related proteins were be under positive selection, considering that at least some of them could be potential targets for the development of novel drugs against flatworm parasites. A stronger positive selection may determine amino acid variations in ortholog proteins, even among closely related species. Therefore, this type of selection would be undesirable in proteins that are potential drug targets . Variations in orthologs of related pathogenic species (for example, among tapeworm or fluke species) could indicate that some of these species would be responsive to the drug, whereas others would be less, or even not at all, responsive . Among the analyzed proteins, there was only one site under positive selection in one protein (UF16). Therefore, almost all the proteins analyzed would be interesting potential targets for broad spectrum drugs in cestodiasis.
As most of these strobilation-related proteins had no functional annotation (22 UF proteins), we performed a systems biology analysis to identify functional modules with the aim of providing additional evidence of their involvement in developmental processes. For cestodes, an important limitation for this type of approach is the scarcity of adequate RNA-seq data, considering the need for large numbers of replicates with low variation among them. Fortunately, RNA-seq data available for the E. multilocularis pre-adult (protoscolex) stage matched all technical requirements related to the replicate number and data quality. This allowed us to allocate eight of the UF proteins, which are so far not characterized, as central nodes (hubs) from seven functional network modules. These seven modules were associated with biological functions such as cell signaling, apoptosis, cell adhesion, and transcriptional regulation. Unfortunately, functional assignments were not possible for the remaining 14 UF proteins with the available transcriptomic data. A possible explanation for this could be the absence, or low expression, of these strobilation-related genes/proteins in bona fide protoscoleces, a non-strobilated stage.
Domain analyses of UF proteins allowed the identification of the UF 3 protein as a putative G protein-coupled receptor and the identification of putative transmembrane regions for eight UF proteins, suggesting that these proteins may have strobilation-related functions on the cell surface. Interestingly, transmembrane domains were found in UF 6 (related to the G-protein coupled receptor, SREBP and Wnt signaling pathways), UF 15 (related to the Wnt signaling pathway), UF 19 and UF 20 (related to the G-protein coupled signaling pathway). Additionally, co-expression and PPI analyses allowed the association of several UF proteins with important cell signaling pathways, namely Wnt (UF 6 and UF 15), G-protein coupled receptor signaling (UF 2, UF 6, UF 19, and UF 20), Notch (UF 10), SREBP (UF 6), and TORC1 (UF 10). These data allow us to infer that these UF proteins play strobilation-related functions in cell-cell interactions mediated by one of these signaling pathways. The involvement of selected UF proteins with strobilation is suggested based on evidence from a more recent proteomic survey carried out by our group with M. corti (Camargo de Lima, J., Floriani, M.A., Debarba, J.A., Monteiro, K.M., Moura, H., Barr, J.R., Zaha, A., Ferreira, H.B., personal communication). This study showed that UF 16 is among the proteins that are newly synthesized within 24 h of strobilation induction.
Among the strobilation-related proteins with functional annotation, Groucho, Mark2, and TCF/LCF are components of the Wnt signaling pathway. As discussed above, two UF proteins, UF 6 and UF 15, were also assigned to the same PPI network that includes this pathway. The Wnt pathway is well conserved among metazoans  and has already been associated with several different developmental events in diverse organisms. Among flatworms, Wnt signaling was initially involved in planarian anterior-posterior axis (head/tail) specification during regeneration . Moreover, in E. multilocularis, Wnt signaling inhibition has been related to protoscolex generation and specification of the primary (antero-posterior) body axis during larval metamorphosis [5, 33]. Our data now provide evidence that this pathway is also required for cestode strobilation.
Evidence for the association of the TGF-β/BMP signaling pathway with strobilation arose from the identification of two of its components, BMP2 and SMAD4, among the selected set of strobilation-related proteins. In metazoans, TGF-β/BMP signaling, including the BMP protein signal and the Smad family of transcription factors, regulates a wide variety of cellular processes, such as proliferation, differentiation, adhesion, migration, and apoptosis  and developmental events, such as body axis formation and regeneration . Moreover, in E. granulosus, Smad4 transcription factor is expressed in the metacestode (cystic larval stage) and in protoscoleces in a tissue-specific manner, with the highest transcript levels being found in activated (strobilation-induced) protoscoleces . Taken together, all this evidence corroborates the contribution of the TGF-β/BMP signaling pathway in cestode strobilation.
The G-protein coupled signaling pathway also seems to be related to strobilation, based on the initial selection of the NPR1 receptor. Moreover, the UF 3 protein also exhibited the classical domains found in receptors of this pathway. In line with this, the G-protein coupled receptor signaling pathway has been shown to be involved in invertebrate developmental processes , including movement, development, reproduction, and plasticity of the whole worm and neuronal development in E. granulosus [38, 39]. It is also notable that serotoninergic G-protein coupled receptors have great high potential as drug targets for antiparasitic intervention . However, in E. granulosus and other cestodes, many proteins of the G-protein coupled pathway remain unknown because of the considerable sequence divergence among the invertebrate orthologs . Interestingly, based on the transcript co-expression and PPI analyses, we were able to relate UF 2, UF 6, UF 19, and UF 20 proteins to this pathway. Also, according to our differential expression analyses, NPR1, UF 2, UF 6, and UF19 are upregulated in the larval stage. In contrast, UF 3 and UF 20 are up-regulated in the adult stage, suggesting the involvement of the G-protein coupled pathway in different processes related to these developmental stages of cestodes.
Finally, transcript co-expression and systems biology analyses associated UF 2, UF 6, UF10, UF 11, UF 15, UF 19, and UF 20 proteins with vesicle-mediated, or Golgi vesicle, transport. In addition, the UF 15 protein seems to be part of the Golgi organization and has already been identified in a proteomic survey of E. granulosus extracellular vesicles (da Silva, E.D., Battistella, M.E., dos Santos, G.B., Monteiro, K.M., Cancela, M., Ferreira, H.B., Zaha, A., personal communication). Additionally, the selected GAK protein is involved in clathrin-dependent trafficking from the trans Golgi network . Vesicle transport has been related to cell signaling during development, as it would facilitate the long-distance trafficking of signaling molecules . Overall, these results suggest that vesicle transport may also be important in signaling pathways involved in cestode strobilation.
Since the the vital importance of developmental processes for parasite biology and reproduction, the identified proteins can be considered as targets for the development of novel therapeutic strategies in cestodiasis control. For instance, proteins such as enolase, histone deacetylases, pyruvate kinase, and serotoninergic G-protein coupled receptors have been proposed as targets for cestodiasis therapeutic interventions [9, 38, 42, 43]. Here, we have provided details of at least 12 developmental proteins with the potential to be targets for drug repositioning and the design of novel anthelmintic drugs. For instance, Mark2 and MAGI2 already have orthologs with available structural data that could be used for structural modeling, binding site prediction, and drug-protein molecular docking studies . Likewise, the strobilation-related proteins identified as being cestode-exclusive could also constitute an interesting set of proteins for functional studies and for design-specific therapeutic approaches to cestodiases. Moreover, all proteins identified here as being related to strobilation have the potential to be molecular markers of cestode development, provided they are further characterized regarding their spatial and temporal expression patterns in a suitable model system, such as M. corti .
In summary, we performed comprehensive evolutionary and functional analyses using plathyhelminth genomic and transcriptomic data to determine the relationships between segmented and non-segmented species. Such analyses allowed the successful identification of a set of 34 evolutionary conserved cestode proteins, with known (12 proteins) or unknown (22 proteins) function, as possible components of developmental pathways required for strobilation. Several of these proteins could be functionally assigned to cell signaling pathways like Wnt, TGF-β/BMP, and G-protein coupled receptor pathways, linking them to strobilation. Moreover, most of the strobilation-related UF proteins were exclusive of platyhelminths or cestodes, and they may act in specialized segmentation mechanisms that evolved and operate in these organisms. It was also found that virtually none of the 34 strobilation-related genes are under positive selection. These results provide further information on the molecular mechanisms and evolution of the cestode strobilation process. Our data also highlighted several proteins of interest for future functional studies as potential developmental markers and/or targets for the development of novel antihelminthic drugs.
Genome data used for the identification of orthologous genes were obtained from the following public databases: National Center for Biotechnology Information (http://0-www.ncbi.nlm.nih.gov.brum.beds.ac.uk/), Sanger Institute (http://www.sanger.ac.uk/), SchistoDB (http://schistodb.net/schisto/), WormBase (http://www.wormbase.org/#012-34-5), and WormBase ParaSite (https://parasite.wormbase.org/species.html). The species used in this study were the tapeworms E. granulosus, E. multilocularis, H. microstoma, M. corti, and T. solium; the flukes C. sinensis, S. haematobium, S. japonicum, S. mansoni, and O. viverrini; the nematodes C. elegans, G. pallida, H. contortus, O. volvulus, S. ratti, and T. muris; the annelid H. robusta; and the mollusk L. gigantea. Database accession numbers and references of the respective genome sequences are available in the supplementary Table S16. The OrthoMCL algorithm v2.0.8  was used with default parameters to identify groups of orthologs and paralogs, hereinafter called orthogroups, among the whole deduced proteomes of all 18 organisms listed above.
A Python (https://www.python.org/) script was developed to select, from the OrthoMCL output (section above), all orthogroups with at least a representative sequence for each of the 18 assessed organisms (supplementary data 1). If any given organism was found to have paralogous sequences, only the longest sequence was used for further analysis, keeping only one orthologous sequence for any given protein for each of the 18 organisms. The resulting multi-FASTA ortholog files for each protein sequence were used as the input for multiple alignments, performed using the CLUSTAL Omega algorithm  with default parameters. Subsequently, the SCaFos software  was used to concatenate the aligned files of amino acid sequences. The selection of supermatrix best-fit model of protein evolution was performed using ProtTest 3 . The MrBayes v3.2.2 software  was used to construct the phylogenomic tree, with two runs, four chains in parallel, 25,000,000 generations, sampling every 100 generations, with a burn-in of 25%, and a stopval of 0.01 for the control of topological convergence.
Identification of proteins associated with strobilated species or life-cycle stages
The orthogroups that had orthologous proteins in all five of the assessed cestode (strobilated) species, but lacked orthologs in at least one of the assessed trematode (non-strobilated) species, were selected using the Python script shown in supplementary data 2. Next, the resulting set of orthogroups associated with strobilated species were categorized and functionally enriched based on the BLAST sequence homologies and gene ontology (GO) annotations using the Blast2GO software . The orthogroups were separated into two subsets: one with proteins of known functions and one with proteins of unknown functions (UF). Among the proteins in the known functions subset, we selected only the development-associated orthogroups.
Published transcriptomic data for E. multilocularis, H. microstoma , and M. corti  were used to identify proteins from selected orthogroups that had genes with different expressions between larval (pre-strobilated) and adult (strobilated) stages. Only bona fide larvae and adult worms were compared, as at present there is no RNA-Seq data available for specimens undergoing strobilation. To avoid differentially expressed genes that were not related to strobilation, those orthologs with differential expression between larval and adult stages of the flukes S. haematobium  and S. mansoni  were excluded from further analyses.
Considering the hypothesis that the segmentation mechanism underlying strobilation came from an ancestral lophotrochozoan mechanism , whether the selected strobilation-related proteins have orthologs in other species of the Lophotrocozoa superphylum was investigated. To achieve this, the proteins identified as being differentially expressed in the cestode larval and adult stages were used as query sequences in searches for additional orthologous sequences in other lophotrocozoan species. Searches were performed in the non-redundant database of NCBI-Genbank (https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/refseq/) using the blastp suite (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins) and in the UniProtKB database (http://www.uniprot.org/) using the HMMER phmmer tool (https://www.ebi.ac.uk/Tools/hmmer/search/phmmer). Only orthologous sequences with identities and coverages greater than 30 and 70%, respectively, were selected. Functional domain annotation of orthologous proteins was performed using InterProScan 5 version 57.0 , and only those orthologous proteins in accordance with the functional domain profile of the orthogroup they were assigned to were used in further analyses.
Phylogenetic and positive selection analyses
Multiple coding DNA sequences (CDSs) or deduced protein sequences were aligned using the CLUSTAL Omega algorithm  and the PRANK program . CLUSTAL alignments were performed with default parameters, guided by an external hidden Markov model (HMM). PRANK alignments were performed using two algorithm variants, one based on an amino acid model (PRANKAA) and the other based on an empirical codon model (PRANKC). Nucleotide alignments were obtained using the PAL2NAL program . Low quality regions of the generated sequence alignments were individually inspected, and manually adjusted, when possible, or removed, when necessary. The final nucleotide and amino acid alignments for all orthogroups were used in the phylogenetic analyses. Selection of the best-fit models of nucleotide and amino acid evolution was performed using the MEGA X software suite .
Phylogenetic trees were generated for each orthogroup by distance and probabilistic methods using the MEGA X software suite. For distance methods, the Neighbor-Joining algorithm was used, with pairwise deletion of gaps applied using the p-distance and Poisson evolutionary models for the amino acid sequence evaluation and the p-distance and Jukes-Cantor models for the nucleotide sequence evaluation. For the probabilistic method, the maximum likelihood algorithm with pairwise deletion of gaps was applied. The bootstrap tests of distance and probabilistic phylogenies were performed using 2000 repetitions for all analyses. Phylogenetic trees were also generated for each orthogroup by the bayesian method using the MrBayes v3.2.2 software, with two runs, four chains in parallel, 25,000,000 generations, sampling every 100 generations, with a burn-in of 25%, and a stopval of 0.01 for the control of topological convergence. The TreeView program  was used to visualize and edit the results of all generated phylogenies. The best phylogenetic tree estimated for each gene was selected based on its statistical support and on the agreement with the expected evolution for these species, as indicated by the phylogenomic tree.
Positive selection analyses were performed using the codeml program in the PAML 4 software . The site-specific model analysis was implemented using the M0, M1a, M2a, M3, M7, and M8 nested models. For all models, a Bayes empirical Bayes (BEB) approach was employed to detect codons with a posterior probability of positive selection > 99% . Positive selection is detected when ω > 1, which means non-synonymous substitutions rates (dN) are higher than synonymous substitutions rates (dS).
Gene co-expression network analysis
Two RNA-seq datasets of the E. multilocularis pre-adult stage were used as inputs for the gene co-expression analyses. The first one was recovered from the ArrayExpress database https://www.ebi.ac.uk/arrayexpress/, accession number E-ERAD-50), and the second one was recovered from the Gene Expression Omnibus (GEO) database (https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/geo/, accession number GSE59173). Low quality reads were filtered using Trimmomatic v0.36  and mapped on the E. multilocularis reference genome version 3 (ftp://ftp.sanger.ac.uk/pub/project/pathogens/Echinococcus/multilocularis/genome/), using the STAR 2.5.3a software . The HTSeq package Version 0.8.0 , with intersection-nonempty counting parameter, was used to generate the gene count estimations. Expression data were imported into the R environment and normalized by a variance-stabilizing transformation method, available in the DESeq2 v1.18.1 package . Finally, batch effect correction was performed utilizing the ComBat R package , implemented by the SampleNetwork R function created by Oldham et al. .
Gene co-expression calculation data was performed using the WGCNA R package, with the biweight midcorrelation method . To provide a co-expression network topology close to a scale-free network, typical of many biological networks , a soft thresholding measurement was performed by raising co-expression values to a β power. The β power that generated a graph with the closest free-scale topology was selected. Next, a hierarchical clustering method, followed by a dynamical branch cutting algorithm, was chosen to identify modules. On each cluster, gene expression values were summarized into module eigengenes (ME). Intramodular hub genes (named as kME) were calculated by correlating the gene expression and the corresponding ME. Genes with kME values higher than 0.85 were considered as hub genes and were used as input to generate PPI networks via STRING (http://string-db.org) searches using the medium confidence interaction score (0.40) and excluding text mining source. For each module of interest, the PPI network generated by STRING was joined to the WGCNA co-expression network and only connections with a value > 0.4 were selected. The resulting co-expression and PPI networks were visualized using the Cytoscape software (v3.6.0) . The sets of proteins in these networks were then used for the functional enrichment analyses using the Blast2GO software .
Availability of data and materials
All data and materials used in this research are publicly available. Genome web links are available in Supplementary Table S16. Transcriptomic data are available from https://www.ebi.ac.uk/arrayexpress/experiments/E-ERAD-50/ and https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/geo/query/acc.cgi?acc=GSE59173.
Bayes empirical Bayes
Coding DNA sequence
Hidden Markov model
Littlewood DTJ. The evolution of parasitism in flatworms. In: Parasitic flatworms: molecular biology, biochemistry, immunology and physiology.Wallingford: CABI; 2006. p. 1–36. https://0-doi-org.brum.beds.ac.uk/10.1079/9780851990279.0001.
Kenny NJ, Noreña C, Damborenea C, Grande C. Probing recalcitrant problems in polyclad evolution and systematics with novel mitochondrial genome resources. Genomics. 2019;111:343–55. https://0-doi-org.brum.beds.ac.uk/10.1016/j.ygeno.2018.02.009.
Lockyer AE, Olson PD, Littlewood DTJ. Utility of complete large and small subunit rRNA genes in resolving the phylogeny of the Neodermata (Platyhelminthes): implications and a review of the cercomer theory. Biol J Linn Soc. 2003;78:155–71. https://0-doi-org.brum.beds.ac.uk/10.1046/j.1095-8312.2003.00141.x.
Olson PD, Timothy D, Littlewood J, Bray RA, Mariaux J. Interrelationships and evolution of the tapeworms (Platyhelminthes: Cestoda). Mol Phylogenet Evol. 2001;19:443–67.
Koziol U, Jarero F, Olson P, Brehm K. Comparative analysis of Wnt expression identifies a highly conserved developmental transition in flatworms. BMC bilogy. 2016;14:10. https://0-doi-org.brum.beds.ac.uk/10.1186/s12915-016-0233-x.
Couso JP. Segmentation, metamerism and the Cambrian explosion. Int J Dev Biol. 2009;53:8–10.
Koziol U. Evolutionary developmental biology (evo-devo) of cestodes. Exp Parasitol. 2017;180:84–100. https://0-doi-org.brum.beds.ac.uk/10.1016/j.exppara.2016.12.004.
Cheng Z, Liu F, Li X, Dai M, Wu J, Guo X, et al. EGF-mediated EGFR/ERK signaling pathway promotes germinative cell proliferation in Echinococcus multilocularis that contributes to larval growth and development. PLoS Negl Trop Dis. 2017;11:e0005418. https://0-doi-org.brum.beds.ac.uk/10.1371/journal.pntd.0005418.
Vaca HR, Celentano AM, Macchiaroli N, Kamenetzky L, Camicia F, Rosenzvit MC. Histone deacetylase enzymes as potential drug targets of neglected tropical diseases caused by cestodes. Int J Parasitol Drugs Drug Resist. 2019;9:120–32. https://0-doi-org.brum.beds.ac.uk/10.1016/j.ijpddr.2019.02.003.
Hemphill A, Stadelmann B, Rufener R, Spiliotis M, Boubaker G, Müller J, et al. Treatment of echinococcosis: albendazole and mebendazole – what else? Parasite. 2014;21:70. https://0-doi-org.brum.beds.ac.uk/10.1051/parasite/2014073.
Koziol U, Brehm K. Recent advances in Echinococcus genomics and stem cell research. Vet Parasitol. 2015;213:92–102. https://0-doi-org.brum.beds.ac.uk/10.1016/j.vetpar.2015.07.031.
Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, et al. The genomes of four tapeworm species reveal adaptations to parasitism. Nature. 2013;496:57–63. https://0-doi-org.brum.beds.ac.uk/10.1038/nature12031.
Howe KL, Bolt BJ, Cain S, Chan J, Chen WJ, Davis P, et al. WormBase 2016: expanding to enable helminth genomic research. Nucleic Acids Res. 2016;44:D774–80. https://0-doi-org.brum.beds.ac.uk/10.1093/nar/gkv1217.
Basika T, Macchiaroli N, Cucher M, Espínola S, Kamenetzky L, Zaha A, et al. Identification and profiling of microRNAs in two developmental stages of the model cestode parasite Mesocestoides corti. Mol Biochem Parasitol. 2016. https://0-doi-org.brum.beds.ac.uk/10.1016/j.molbiopara.2016.08.004.
Bizarro CV, Bengtson MH, Ricachenevsky FK, Zaha A, Sogayar MC, Ferreira HB. Differentially expressed sequences from a cestode parasite reveals conserved developmental genes in platyhelminthes. Mol Biochem Parasitol. 2005;144:114–8. https://0-doi-org.brum.beds.ac.uk/10.1016/j.molbiopara.2005.07.002.
Basika T, Paludo GP, Araujo FM, Salim AC, Pais F, Maldonado L, et al. Transcriptomic profile of two developmental stages of the cestode parasite Mesocestoides corti. Mol Biochem Parasitol. 2019;229:35–46. https://0-doi-org.brum.beds.ac.uk/10.1016/j.molbiopara.2019.02.006.
Camargo de Lima J, Monteiro KM, Basika Cabrera TN, Paludo GP, Moura H, Barr JR, et al. Comparative proteomics of the larval and adult stages of the model cestode parasite Mesocestoides corti. J Proteomics. 2018;175:127–35. https://0-doi-org.brum.beds.ac.uk/10.1016/j.jprot.2017.12.022.
Laschuk A, Monteiro KM, Vidal NM, Pinto PM, Duran R, Cerveñanski C, et al. Proteomic survey of the cestode Mesocestoides corti during the first 24 hours of strobilar development. Parasitol Res. 2011;108:645–56.
Debarba JA, Monteiro KM, Moura H, Barr JR, Ferreira HB, Zaha A. Identification of newly synthesized proteins by Echinococcus granulosus Protoscoleces upon induction of strobilation. PLoS Negl Trop Dis. 2015;9:e0004085. https://0-doi-org.brum.beds.ac.uk/10.1371/journal.pntd.0004085.
Young ND, Jex AR, Li B, Liu S, Yang L, Xiong Z, et al. Whole-genome sequence of Schistosoma haematobium. Nat Genet. 2012;44:221–5. https://0-doi-org.brum.beds.ac.uk/10.1038/ng.1065.
Protasio AV, Tsai IJ, Babbage A, Nichol S, Hunt M, Aslett MA, et al. A systematically improved high quality genome and Transcriptome of the human blood fluke Schistosoma mansoni. PLoS Negl Trop Dis. 2012;6:e1455. https://0-doi-org.brum.beds.ac.uk/10.1371/journal.pntd.0001455.
van Dam S, Võsa U, van der Graaf A, Franke L, de Magalhães JP. Gene co-expression analysis for functional classification and gene-disease predictions. Brief Bioinform. 2018;19:575–92. https://0-doi-org.brum.beds.ac.uk/10.1093/bib/bbw139.
Manzoni C, Kia DA, Vandrovcova J, Hardy J, Wood NW, Lewis PA, et al. Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences. Brief Bioinform. 2018;19:286–302. https://0-doi-org.brum.beds.ac.uk/10.1093/bib/bbw114.
Steinberg J, Ritchie GRS, Roumeliotis TI, Jayasuriya RL, Clark MJ, Brooks RA, et al. Integrative epigenomics, transcriptomics and proteomics of patient chondrocytes reveal genes and pathways involved in osteoarthritis. Sci Rep. 2017;7:8935. https://0-doi-org.brum.beds.ac.uk/10.1038/s41598-017-09335-6.
Du Y, Clair GC, Al Alam D, Danopoulos S, Schnell D, Kitzmiller JA, et al. Integration of transcriptomic and proteomic data identifies biological functions in cell populations from human infant lung. Am J Physiol Cell Mol Physiol. 2019;317:L347–60. https://0-doi-org.brum.beds.ac.uk/10.1152/ajplung.00475.2018.
International Helminth Genomes Consortium. Comparative genomics of the major parasitic worms. Nat Genet. 2019;51:163–74. https://0-doi-org.brum.beds.ac.uk/10.1038/s41588-018-0262-1.
Dall’Olio G, Laayouni H, Luisi P, Sikora M, Montanucci L, Bertranpetit J. Distribution of events of positive selection and population differentiation in a metabolic pathway: the case of asparagine N-glycosylation. BMC Evol Biol. 2012;12:98. https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2148-12-98.
Georgi B, Voight BF, Bućan M. From mouse to human: evolutionary genomics analysis of human Orthologs of essential genes. PLoS Genet. 2013;9:e1003484. https://0-doi-org.brum.beds.ac.uk/10.1371/journal.pgen.1003484.
Jordan IK, Rogozin IB, Wolf YI, Koonin EV. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 2002;12:962–8. https://0-doi-org.brum.beds.ac.uk/10.1101/gr.87702.
Vamathevan JJ, Hasan S, Emes RD, Amrine-Madsen H, Rajagopalan D, Topp SD, et al. The role of positive selection in determining the molecular cause of species differences in disease. BMC Evol Biol. 2008;8:273. https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2148-8-273.
Croce JC, McClay DR. Evolution of the Wnt pathways. Methods Mol Biol. 2008;469:3–18. https://0-doi-org.brum.beds.ac.uk/10.1007/978-1-60327-469-2_1.
Lin AYT, Pearson BJ. Planarian yorkie/YAP functions to integrate adult stem cell proliferation, organ homeostasis and maintenance of axial patterning. Development. 2014;141:1197–208. https://0-doi-org.brum.beds.ac.uk/10.1242/dev.101915.
Montagne J, Preza M, Castillo E, Brehm K, Koziol U. Divergent Axin and GSK-3 paralogs in the beta-catenin destruction complexes of tapeworms. Dev Genes Evol. 2019;229:89–102. https://0-doi-org.brum.beds.ac.uk/10.1007/s00427-019-00632-w.
Zavala-Góngora R, Kroner A, Bernthaler P, Knaus P, Brehm K. A member of the transforming growth factor-beta receptor family from Echinococcus multilocularis is activated by human bone morphogenetic protein 2. Mol Biochem Parasitol. 2006;146:265–71. https://0-doi-org.brum.beds.ac.uk/10.1016/j.molbiopara.2005.12.011.
Epping K, Brehm K. Echinococcus multilocularis: molecular characterization of EmSmadE, a novel BR-Smad involved in TGF-β and BMP signaling. Exp Parasitol. 2011;129:85–94. https://0-doi-org.brum.beds.ac.uk/10.1016/j.exppara.2011.07.013.
Zhang C, Wang L, Wang H, Pu H, Yang L, Li J, et al. Identification and characterization of functional Smad8 and Smad4 homologues from Echinococcus granulosus. Parasitol Res. 2014;113:3745–57. https://0-doi-org.brum.beds.ac.uk/10.1007/s00436-014-4040-4.
Meeusen T, Mertens I, De Loof A, Schoofs L. G protein-coupled receptors in invertebrates: a state of the art. Int Rev Cytol. 2003;230:189–261 http://0-www.ncbi.nlm.nih.gov.brum.beds.ac.uk/pubmed/14692683.
Camicia F, Herz M, Prada LC, Kamenetzky L, Simonetta SH, Cucher MA, et al. The nervous and prenervous roles of serotonin in Echinococcus spp. Int J Parasitol. 2013;43:647–59. https://0-doi-org.brum.beds.ac.uk/10.1016/j.ijpara.2013.03.006.
Camicia F, Celentano AM, Johns ME, Chan JD, Maldonado L, Vaca H, et al. Unique pharmacological properties of serotoninergic G-protein coupled receptors from cestodes. PLoS Negl Trop Dis. 2018;12:e0006267. https://0-doi-org.brum.beds.ac.uk/10.1371/journal.pntd.0006267.
Zhang CX, Engqvist-Goldstein AEY, Carreno S, Owen DJ, Smythe E, Drubin DG. Multiple roles for cyclin G-associated kinase in clathrin-mediated sorting events. Traffic. 2005;6:1103–13. https://0-doi-org.brum.beds.ac.uk/10.1111/j.1600-0854.2005.00346.x.
Gradilla A-C, Simon E, Aguilar G, Guerrero I. From intra- to extracellular vesicles: extracellular vesicles in developmental signalling. Essays Biochem. 2018;62:215–23. https://0-doi-org.brum.beds.ac.uk/10.1042/EBC20180001.
Coghlan A, Tyagi R, Cotton JA, Holroyd N, Rosa BA, Tsai IJ, et al. Comparative genomics of the major parasitic worms. Nat Genet. 2019;51:163–74.
Sulima A, Savijoki K, Bień J, Näreaho A, Sałamatin R, Conn DB, et al. Comparative proteomic analysis of Hymenolepis diminuta cysticercoid and adult stages. Front Microbiol. 2017;8:2672. https://0-doi-org.brum.beds.ac.uk/10.3389/fmicb.2017.02672.
Govindaraj RG, Naderi M, Singha M, Lemoine J, Brylinski M. Large-scale computational drug repositioning to find treatments for rare diseases. NPJ Syst Biol Appl. 2018;4:13. https://0-doi-org.brum.beds.ac.uk/10.1038/s41540-018-0050-7.
Costa CB, Monteiro KM, Teichmann A, da Silva ED, Lorenzatto KR, Cancela M, et al. Expression of the histone chaperone SET/TAF-Iβ during the strobilation process of Mesocestoides corti (Platyhelminthes, Cestoda). Parasitology. 2015;142:1171–82. https://0-doi-org.brum.beds.ac.uk/10.1017/S003118201500030X.
Li L, Stoeckert CJ, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–89. https://0-doi-org.brum.beds.ac.uk/10.1101/gr.1224503.
Sievers F, Higgins DG. Clustal omega, accurate alignment of very large numbers of sequences. Methods Mol Biol. 2014;1079:105–16. https://0-doi-org.brum.beds.ac.uk/10.1007/978-1-62703-646-7_6.
Roure B, Rodriguez-Ezpeleta N, Philippe H. SCaFoS: a tool for Selection, Concatenation and Fusion of Sequences for phylogenomics. BMC Evol Biol. 2007;7 Suppl 1:S2. https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2148-7-S1-S2.
Darriba D, Taboada GL, Doallo R, Posada D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics. 2011;27:1164–5. https://0-doi-org.brum.beds.ac.uk/10.1093/bioinformatics/btr088.
Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61:539–42. https://0-doi-org.brum.beds.ac.uk/10.1093/sysbio/sys029.
Conesa A, Götz S. Blast2GO: a comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics. 2008;2008:619832. https://0-doi-org.brum.beds.ac.uk/10.1155/2008/619832.
Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–40. https://0-doi-org.brum.beds.ac.uk/10.1093/bioinformatics/btu031.
Löytynoja A, Goldman N. webPRANK: a phylogeny-aware multiple sequence aligner with interactive alignment browser. BMC Bioinformatics. 2010;11:–579. https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2105-11-579.
Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34 Web Server:W609–12. https://0-doi-org.brum.beds.ac.uk/10.1093/nar/gkl315.
Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol. 2018;35:1547–9. https://0-doi-org.brum.beds.ac.uk/10.1093/molbev/msy096.
Page RDM. Visualizing phylogenetic trees using TreeView. In: current protocols in bioinformatics. Hoboken: John Wiley & Sons, Inc.; 2002. https://0-doi-org.brum.beds.ac.uk/10.1002/0471250953.bi0602s01.
Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24:1586–91. https://0-doi-org.brum.beds.ac.uk/10.1093/molbev/msm088.
Murrell B, Wertheim JO, Moola S, Weighill T, Scheffler K, Kosakovsky Pond SL. Detecting individual sites subject to episodic diversifying selection. PLoS Genet. 2012;8:e1002764. https://0-doi-org.brum.beds.ac.uk/10.1371/journal.pgen.1002764.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20. https://0-doi-org.brum.beds.ac.uk/10.1093/bioinformatics/btu170.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. https://0-doi-org.brum.beds.ac.uk/10.1093/bioinformatics/bts635.
Anders S, Pyl PT, Huber W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–9. https://0-doi-org.brum.beds.ac.uk/10.1093/bioinformatics/btu638.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. https://0-doi-org.brum.beds.ac.uk/10.1186/s13059-014-0550-8.
Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–27. https://0-doi-org.brum.beds.ac.uk/10.1093/biostatistics/kxj037.
Oldham MC, Langfelder P, Horvath S. Network methods for describing sample relationships in genomic datasets: application to Huntington’s disease. BMC Syst Biol. 2012;6:63. https://0-doi-org.brum.beds.ac.uk/10.1186/1752-0509-6-63.
Song L, Langfelder P, Horvath S. Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinformatics. 2012;13:328. https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2105-13-328.
Barabási A-L, Oltvai ZN. Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004;5:101–13. https://0-doi-org.brum.beds.ac.uk/10.1038/nrg1272.
Demchak B, Hull T, Reich M, Liefeld T, Smoot M, Ideker T, et al. Cytoscape: the network visualization tool for GenomeSpace workflows. F1000Res. 2014;3:151. https://0-doi-org.brum.beds.ac.uk/10.12688/f1000research.4492.2.
The authors are thankful to Dr. Magdalena Zarowiecki and the Wellcome Trust Sanger Institute (UK) for providing the access to M. corti genome data. The access to high-performance computing facilities granted by the Laboratório Nacional de Computação Científica (LNCC, Brazil) is also gratefully acknowledged.
This work was supported by the Conselho Nacional de Desenvolvimento Científico e Tecnológico – CNPq (grant number 428305/2018–0) and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – CAPES (grant number 611 23038.010043/2013–02). GPP is a recipient of a CAPES Ph.D. fellowship. The funders had no role in the design of the study, in the collection, analysis, and interpretation of data, in the decision to publish and in writing the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no conflicts of interests (financial/non-financial) concerning this article.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Paludo, G.P., Thompson, C.E., Miyamoto, K.N. et al. Cestode strobilation: prediction of developmental genes and pathways. BMC Genomics 21, 487 (2020). https://0-doi-org.brum.beds.ac.uk/10.1186/s12864-020-06878-3
- Comparative genomics
- Co-expression network