Identification of new members of the MAPK gene family in plants shows diverse conserved domains and novel activation loop variants

Mohanta, Tapan Kumar; Arora, Pankaj Kumar; Mohanta, Nibedita; Parida, Pratap; Bae, Hanhong

doi:10.1186/s12864-015-1244-7

Research article
Open access
Published: 06 February 2015

Identification of new members of the MAPK gene family in plants shows diverse conserved domains and novel activation loop variants

Tapan Kumar Mohanta¹,
Pankaj Kumar Arora¹,
Nibedita Mohanta²,
Pratap Parida³ &
…
Hanhong Bae¹

BMC Genomics volume 16, Article number: 58 (2015) Cite this article

6367 Accesses
90 Citations
2 Altmetric
Metrics details

Abstract

Background

Mitogen Activated Protein Kinase (MAPK) signaling is of critical importance in plants and other eukaryotic organisms. The MAPK cascade plays an indispensible role in the growth and development of plants, as well as in biotic and abiotic stress responses. The MAPKs are constitute the most downstream module of the three tier MAPK cascade and are phosphorylated by upstream MAP kinase kinases (MAPKK), which are in turn are phosphorylated by MAP kinase kinase kinase (MAPKKK). The MAPKs play pivotal roles in regulation of many cytoplasmic and nuclear substrates, thus regulating several biological processes.

Results

A total of 589 MAPKs genes were identified from the genome wide analysis of 40 species. The sequence analysis has revealed the presence of several N- and C-terminal conserved domains. The MAPKs were previously believed to be characterized by the presence of TEY/TDY activation loop motifs. The present study showed that, in addition to presence of activation loop TEY/TDY motifs, MAPKs are also contain MEY, TEM, TQM, TRM, TVY, TSY, TEC and TQY activation loop motifs. Phylogenetic analysis of all predicted MAPKs were clustered into six different groups (group A, B, C, D, E and F), and all predicted MAPKs were assigned with specific names based on their orthology based evolutionary relationships with Arabidopsis or Oryza MAPKs.

Conclusion

We conducted global analysis of the MAPK gene family of plants from lower eukaryotes to higher eukaryotes and analyzed their genomic and evolutionary aspects. Our study showed the presence of several new activation loop motifs and diverse conserved domains in MAPKs. Advance study of newly identified activation loop motifs can provide further information regarding the downstream signaling cascade activated in response to a wide array of stress conditions, as well as plant growth and development.

Background

During evolution, plants have developed complex arrays of defense mechanisms to mitigate the copious, often adverse and ever changing environmental conditions. Perception of variations in environmental as well as internal developmental cues, transduction and amplification of signals and activation of the response to stimuli is crucial for survival, optimal growth and development. Protein kinases are important signaling molecules that perceive various signals and transduce them for active responses. These compounds carryout diverse phosphorylation processes at the transcriptional, translational and post-translational level by catalyzing the addition of phosphate groups to serine and threonine/tyrosine residues in their target proteins in both prokaryotic and eukaryotic cells [1,2]. These modifications have led to changes in catalytic activity, affinity and interaction activity of target protein. However, the phosphorylation events in proteins are reversible due to protein phosphatase, enabling maintenance of the balance between kinase driven phosphorylation and phosphatase driven dephosphorylation events [3].

Plant genomes are rich in genes that encodes protein kinases and constitute the kinase super-family [4]. These super families are divided into different classes based on amino acid sequence similarity and functional characteristics. The mitogen activated protein kinase gene family, which is one such family, is known for evolutionary conservation across eukaryotic taxonomic groups and functioning within hierarchical cascades [1]. Phosphorylated proteins carry out a wide array of cellular responses, including changes in gene expression, innate immunity, developmental programmes and stress and hormonal responses [5-8].

Mitogen activated protein kinases consist of three kinase-modules composed of mitogen activated protein kinase kinase kinase (MAPKKKs), mitogen activated protein kinase kinase (MAPKKs) and mitogen activated protein kinase (MAPKs). In the general model, extracellular signals activate MAPKKKs, which phosphorylate downstream MAPKKs. The phosphorylated MAPKKs in turn phosphorylate MAPKs [9,10]. Protein phosphorylation events may occur throughout the protein kinase sequences, but usually occur on the activation loop [11]. The activation loop, which is present at the C-terminal end, resides within sub-domain VII and VIII of sub-domain eleven [4]. The activation loop contains conserved serine, threonine and/or tyrosine amino acid residues that may be reversibly phosphorylated [6] via cis auto-phosphorylation or trans phosphorylation by upstream kinases [12].

The initial descriptions of components of the MAPK cascade have been provided for the popular model plant, Arabidopsis. Advancements in sequencing technologies and bioinformatics tools have greatly increased the pace of genome sequencing projects, resulting in successful sequencing of several plant genomes. Post genome sequencing projects have enabled relatively easy identification of particular gene families based on conserved signature motifs and sequence similarity. Available genome sequences from several plants genomes have provided us with an opportunity to identify MAPK family members across photosynthetic eukaryotes (plants and algae) that will shed more light on MAPK evolution and signaling in plants and lower photosynthetic eukaryotes. In recent years, identification of MAPK gene family members in plants has been limited to a few species including Arabidopsis thaliana [13], Oryza sativa [14], maize [15], Brassica napus [16], apple [17], and Brachypodium [18]. Further, a study by Janitza et al. [19] and a review article by Doczi et al. [20] have provided a comprehensive overview of the evolutionary history of MAPKs in green plants by using a limited number of plants species.

However, there is currently limited information regarding the nomenclature, conserved structures, genomics and biochemistry of MAPKs in plants. In this communication, we identify the MAPK gene families of 40 different plant species and provide a unique nomenclature to all MAPKs. This nomenclature system can be further applied to newly identified MAPKs of other species. Furthermore, the genomics, biochemistry and conserved consensuses of plant MAPKs describe several novel aspects of plant MAPKs.

Results

Identification and nomenclature of MAPKs

We identified the MAPK gene family from 40 different plant species starting from the unicellular lower eukaryote Chlamydomonas reinhardtii to the multi-cellular angiosperm Arabidopsis thaliana and attempted to cover the maximum number of species across the plant lineage. We found that MAPKs members of a genome varied from species to species across the whole plant lineage. The 40 species collectively gave rise to 589 MAPK sequences. The tetraploid Glycine max contained the most MAPK genes in its genome (31), whereas the lower eukaryotic plant Ostreocccus lucimarinus contained three (Table 1). In addition, Brassica compestris (30), Gossipium raimonddi (28), Malus domestica (28), Panicum virgatum (27), Linum usitatissimum (24) and Populus trichocarpa (21) contained higher number of MAPKs (Table 1, Additional file 1). All the identified 589 MAPKs were provided with specific names according to the orthologous sequence similarity with Arabidopsis thaliana or Oryza sativa.

Table 1 Table representing genome size of different plant species and number of MAPK genes present per genome (species)

Full size table

Genomics of MAPKs

Among 589 MAPKs identified from 40 different plant species, Fragaria vesca FvMPK20 contains the largest MAPK gene, with 2574 nucleotides long open reading frame (ORF), while Panicum virgatum PvMPK1-4 has the smallest gene, with 544 nucleotides long ORF (Additional file 1). Transcript organization showed that MAPKs have different arrays of intron organization in their genes. The numbers of MAPKs containing different arrays of introns were as follows: intronless (7), single intron (41), two introns (39), three introns (18), four introns (15), five introns (161), six introns (32), seven introns (20), eight introns (39), nine introns (126), ten introns (72) and eleven introns (18)(Additional file 1). The terrestrial plant Selaginella moellendorffii SmMPK10 contained maximum of 14 introns in its gene. Some intronless MAPKs present in higher eukaryotic plants include PvMPK7-2, PaMPK2, PaMPK3, PaMPK7-1, and PaMPK20, while lower eukaryotic algae contain OlMPK7 and OlMPK9 (Additional file 1).

The molecular weights of MAPK proteins were vary from 22.381 (VvMPK1) to 98.915 (MdMPK20-2) kDa and the isoelectric points vary from 5.00 (MdMPK20-1) to 9.52 (CsubMPK15) (Additional file 2). The isoelectric point (pI) of group A and group B MAPKs were ranges from acidic to slightly acidic, while those of group C and group D were reside within the basic pI ranges. The average amino acid composition of MAPK protein showed that, abundance of leucine (9.63) amino acid was maximum and tryptophan (0.70) amino acid was minimum (Additional file 3). The average abundance of the most important amino acids threonine, glutamic acid, and tyrosine (T-E-Y) were 4.65, 6.73 and 3.91, respectively, whereas the average abundance of aspartic acid was 6.01. The abundance of the hydrophobic amino acids alanine (6.91), isoluecine (6.24), leucine (9.63), phenylalanine (4.45), proline (6.25) and valine (5.70) in MAPKs were relatively higher than that of other amino acids (Additional file 3).

Conserved motifs and domains

N-terminal conserved sequences

The MAPKs are characterized by the presence of a conserved T-E-Y/T-D-Y motif in the activation loop region. Despite having the activation loop T-E-Y/T-D-Y motif in MAPKs, in this study, we found that several MAPKs shared conserved N-terminal T-E-Y, T-D-Y, S-D-Y and S-E-Y motifs (Figure 1A, 1B, Table 2, Additional file 4). These N-terminal conserved motifs are only shared by group D MAPKs. In total, 182 genes shared the N-terminal conserved motifs. Among them, 11 genes shared the S-D-Y motif, 27 shared the S-E-Y motif, six shared the T-D-Y motif and the remaining138 genes shared the T-E-Y motif (Additional file 4). Chlamydomonas and Volvox share a common A-V-H motif instead of the S-E-Y/S-D-Y/T-E-Y and T-D-Y motif (Additional file 4). Several other group specific conserved motifs are also present in the N-terminal region of MAPKs. They includes A-K-Y, N-K-Y (group A), S-K-Y, R-K-Y (group B), T-K-Y (group C) and S-Q-Y, N-R-Y, S-R-Y (group D) (Figure 2, Table 2). These motifs are present immediately after the N-terminal T-E-Y, T-D-Y, S-D-Y and S-E-Y motifs. The MAPK sequences sharing different numbers of motifs are A-K-Y (70), N-K-Y (13), S-K-Y (74), R-K-Y (42), T-K-Y (81), S-Q-Y (36), N-R-Y (98), and S-R-Y (91) (Additional file 4). In addition to the presence of conserved motifs, the N-terminal region of MAPKs also contained conserved amino acid consensus sequences including I-G-x-G-x-Y-G-x-V, I-K-K-I-x₃-F, D-A-x-R-x-L-R-E, F-x-D-I-Y-x₃-E-L-M, D-L-x₂-V-I, D-x-L-x₂-E-H, Q-x-L-R-x-L-K-Y-x-H, H-R-D-L-K-P-x-N, and L-x-L-x-N-C-x-L-K-I-x-D-F-G-L-A-R (Figure 1A, Table 3).

Table 2 Different conserved motifs present in N-terminal, activation loop region and C-terminal end of plant MAPKs

Full size table

Table 3 Table presenting different conserved consensus sequences present in plant MAPKs

Full size table

Conserved sequences in activation loop region

As discussed earlier, MAPKs contain the classic T-E-Y or T-D-Y motif in the activation loop region, and we found that majority of MAPK members contain classic T-E-Y/ T-D-Y motif in the activation loop region. Sequence alignment revealed that M. domestica MdMPK9 and P. trichocarpa PtMPK17-1contain an additional sequence of TAYKQYFLWTKLLTFMKDY and TVCVFLKPGFTFQCLIDY between the conserved T-D-Y motif (Additional file 4). In this study, we found eight novel activation loop motifs of MAPKs and reported here for first time. The newly identified activation loop motifs are, T-Q-Y (group A), M-E-Y, T-E-C, T-V-Y, (group B), T-E-M (group D), T-S-Y, T-Q-M, and T-R-M (group E) (Figures 3 and 4, Table 2). In total, eight MAPK genes share the M-E-Y motif in the classic T-E-Y and T-D-Y region (Additional file 4). The MAPK genes sharing the M-E-Y motif are S. lycopersicum SlMPK4-1, S. tubersum StMPK4-1, B. distachyon BdMPK4-2, P. vulgaris FvMPK4-2, S. italica SiMPK4-2, Z. mays ZmMPK4-2, S. bicolor SbMPK4-2 and O. sativa OsMPK16-2. These M-E-Y motifs fall under group B MAPKs. The new motif T-Q-Y (group A) is shared by BrMPK10-2, T-E-C (group B) by GrMPK4-6, T-V-Y (group B) by GmMPK4-1, T-E-M (group E) by PaMPK5, PaMPK14 and PaMPK7-2, while T-S-Y is shared by OlMPK7 (group E), T-Q-M (group E) by PaMPK10 and T-R-M (group E) by CsubMPK3 (Figure 3, Table 2).

C-terminal conserved sequences

The classical T-E-Y/T-D-Y motif at the activation loop region is followed closely by the presence of different C-terminal S-D-Y/S-E-Y/T-D-Y/D-N-Y/S-Q-Y/S-R-Y/S-K-Y/S-N-Y motifs (Figure 3, Table 2). These S-D-Y/S-E-Y and T-D-Y motifs are shared by group A and group B MAPKs. Additionally, the D-N-Y, S-Q-Y, and T-K-Y motifs are shared by group C MAPKs and S-K-Y, T-K-Y, S-R-Y, and S-N-Y motifs are shared by group D MAPKs (Table 2). The MAPK sequences sharing different numbers of C-terminal motifs include S-D-Y (57), S-E-Y (123), T-D-Y (8), D-N-Y (78), S-Q-Y (7), S-K-Y (217), S-R-Y (7), and S-N-Y (11). There are also conserved T-R-W-Y-R-A-P-E-L,I-D-x-W-S-V-G-C and Q-x-L-L-x-F-D-P consensus sequences present in the immediately post activation loop region of MAPKs (Figure 1A, Table 3).

Common docking domains

During mitogen activated protein kinase (MAPK) signaling, the ability of MAP2Ks to recognize their cognate MAPKs are facilitated by presence of short docking motif (D-site) that binds to its target complementary region on the MAPK. Similarly MAPKs are also contain short docking site that recognizes many downstream target proteins by utilizing the same strategy. From the studied MAPKs, we did not find presence of any unique and specific conserved docking domains for all groups of the MAPKs. Instead, conservation of the docking domains consensus is somewhat group specific (Table 4). The conserved docking domains of different MAPKs are K-M-L-T-F-D-P-K/R-Q/K-R-I-T-V-E-D/E-A-L (group A), K-M-L-V/I-F-D-P-x-K-R-I-I-V-D-E-A-L (group B K-M-L-I-F-D-P-S/T-K-R-I-S-V-T-E-A-L (group C) and L-L-E-R/K-L-L-A-F-D-P-K-D-R-P-T-A-E-E-A-L (group D) (Table 4).

Table 4 Table showing predicted group specific common docking (CD) sites of plant MAPKs

Full size table

Phylogeny

An unrooted phylogenetic tree was constructed to infer group specific relationships of MAPKs. Upon phylogenetic analysis, all studied MAPK genes are fell into six different clusters, that are named according to the MAPK grouping of A. thaliana. In A. thaliana, MAPK genes are classified into four different groups (A, B, C, and D) based on their evolutionary relationship and presence of the T-D-Y and T-E-Y phosphorylation motif. In this study, MAPKs are categorized into six different groups namely group A (red), B (blue), C (pink), D (purple), E (teal) and F (green) (Figure 5, Additional file 5). Two new group of MAPK (group E and F) are generated during this analysis. The new group E and F are mainly shared by MAPKs of lower eukaryotic and gymnosperm plants such as CsubMPK7, MpMPK13, SmMPK10, CreinMPK7, VcMPK5, CsubMPK3, PaMPK10, PaMPK7-2, PaMPK5, PaMPK14, CreinMPK4-1, VcMPK4-1, OlMPK6, MpMPK4. The phylogenetic analysis revealed that 89, 128, 100, 258, 10 and 4 MAPKs fall into group A, B, C, D, E, and F respectively (Additional file 5, Table 5). The average overall phylogenetic mean distance of plant MAPK is 0.54 (standard error 0.029). During phylogenetic distance estimation, all the positions with less than 95% site coverage are eliminated. That is, fewer than 5% alignment gaps. The missing data and ambiguous bases are allowed at any position.

Table 5 Table showing group specific distribution of different MAPKs in plants

Full size table

Statistical analysis

Different statistical analysis was carried out to infer the statistical significance of the study. In Tajima’s relative rate test, different MAPK sequences were taken randomly from active data as different groups. Analysis was repeated for two times by taking random MAPK sequences into different taxonomic group A, B and C (these groups are statistical groups and should not be confused with MAPK groups). More specifically, group C was used as out group. When we took MgMPK4-1 (group A), GmMPK16-3 (group B) and AtPIN1 (group C), resulted p-value was 0.05935 and X ²-test result was 3.56 (Table 6). When MgMPK4-1, GmMPK16-3 and AtCBL1 are taken as group A, B and C respectively, the p-value result was 0.00468 and X ²-test result was 8.00. In both the cases statistical value was found to be significant (Table 6). The P-value less than 0.05 is often used to reject the null hypothesis of equal rates between lineages (p ≤ 0.01: very strong presumption against null hypothesis, 0.01 < p ≤ 0.05 strong presumption against null hypothesis, 0.05 < p ≤ 0.1 low presumption against null hypothesis, p > 0.1 no presumption against null hypothesis). The analysis involved 3 amino acid sequences. All positions containing gaps and missing data were eliminated. In Tajima’s test statistics (D) for neutrality, the D value was found to be 4.904140 (D = 4.904140) (Table 7). All the positions with less than 95% site coverage are eliminated during Tajima’s test for neutrality. There were a total of 322 positions in the final dataset.

Table 6 Tajima’s relative rate test

Full size table

Table 7 Tajima’s test for neutrality

Full size table

Gene duplication

Chromosomes are evolved via fusion, fission, insertion, and duplication events, allowing evolution of chromosome size and number, and hence the genes. Gene duplication is the major force acting on the evolution of different species, and the gene families are groups of genes generated by duplication. The sizes of gene families reflect the number of duplicated genes, which are known as paralogs. Several plant MAPKs analyzed during this study were found to be duplicate genes resulting in several paralogous genes (Additional file 1). The plants with duplicated genomes give rise to more duplicated MAPKs relative to species with non duplicated genomes. Accordingly, species such as G. max, G. raimondii, and M. domestica contain more duplicated genes. Nevertheless, almost all species posses duplicated MAPKs in their genome. The gene duplication result of plant MAPKs those contained novel activation loop motif are reported in table (Table 8). All the MAPK resulted in z-score above four with 100% level of confidence.

Table 8 Gene duplication analysis of some selective plant MAPKs that contain novel activation loop motif

Full size table

MAPK groups

MAPKs in monocotyledonous plants

Among the 40 different plant species analyzed during this study, six were monocotyledonous plants (B. distachyon, O. sativa, P. virgatum, S. italica, S. bicolour and Z. mays) (Table 1). Our study revealed that O. sativa contains 17 MAPKs, not 15 as reported earlier. Among monocot species, P. virgatum contains the highest number of MAPKs in its genome (27).

MAPKs in dicotyledonous plants

The MAPK gene family of dicotyledonous plants has shown large variations among MAPK gene family members (Table 1). Among 40 different species investigated herein, 26 were dicot plants. This group contained as few as 6 MAPK gene in M. guttatus to as many as 31 in soybean (G. max). Investigation of MAPKs in G. max in another study using the HMM (hidden Markov model) approach also showed 35 MAPKs; however, four of them are indeed ‘MAPK-likes’ genes, making the actual MAPK number 31.

MAPKs in lower photosynthetic eukaryotes (Algae, Moss and Pteridophyte)

The lower photosynthetic groups includes four algae, one bryophyte, one pteridophyte and one gymnosperm species (Table 1). Multiple MAPKs were seen in several species of algae (Table 1). Our study also showed multiple MAPKs (TEY, 2/3/4 and TDY, 1/2) in unicellular and multi-cellular algae (Table 5). A genome survey of S. moellendorffii, a model lycophyte (non-seed vascular plant) and a primitive species revealed the presence of six MAPKs with single MAPKs in group A and B, whereas two MAPKs were present in group C and D. As a result, there are four MAPKs with the TEY motif and only two with the TDY motif. Among the mosses and algae, P. patens and O. lucimarinus lacks ‘group A’ type MAPKs, while V. cartieri lacks ‘group B’ and ‘group C’ type MAPKs and C. reinhardtii and C. subellipsoidea do not possess ‘group C’ MAPKs. Interestingly, none of the studied lower photosynthetic eukaryotes or land plants (both mono and dicot species) lacked ‘group D’ type MAPKs.

Discussion

Nomenclature and identification of MAPKs

It is very important to assign an appropriate and specific name to each member of the family to enable a thorough understanding of it. Therefore, we provided unique names to all 589 identified MAPKs across the plant lineage using the orthologous based nomenclature system proposed by Hamel et al. [21]. In the traditional naming system, names are assigned to gene(s) that are identified and cloned first, regardless of their similarities to other gene(s). For example, if someone cloned the ThMPK gene from Thelluginella halophila first, it named as ThMPK1, regardless of its orthologous similarity with other MPKs. Accordingly, if this ThMPK1 has orthologous similarity with AtMPK6, it should be named ThMPK6, but this does not happen. However, orthology lends the legitimacy to transfer of functional similarities from its ancestors [22-24]. As a result, orthology based nomenclature can provide succinct information regarding its orthologous counterpart gene. Practically, it is difficult to study every individual MAPK gene in all plant species to understand their specific roles in different aspects of plant biology. As orthology lends the legitimacy of common ancestry and evolutionary function, orthology based nomenclature will provide ideas regarding possible roles of specific genes in the plant species being investigated. This system of nomenclature can be further extended to newly identified gene families of other plant species.

To date, MAPK gene family members of only few plant species have been reported, including Oryza sativa [14], Arabidopsis thaliana [13], Zea mays [15], Brachypodium distachyon [18], Canola (Brassica napus) [16] and Malus domestica [17]. Although MAPKs from these plant species are previously identified, we have included them here to broaden the study. Inclusion of these species in our study led to identification of some new members of MAPKs. An earlier study by Hamel et al., revealed that the O. sativa genome contains 16 members of the MAPK gene family [21]. However, we recently identified 17 MAPK gene family members from O. sativa. Additionally, Zhang et al., reported 26 members of the MAPK gene family from Malus domestica [17], but we found that M. domestica contains 28 members.