- Research article
- Open Access
Whole genome comparisons reveal panmixia among fall armyworm (Spodoptera frugiperda) from diverse locations
BMC Genomics volume 22, Article number: 179 (2021)
The fall armyworm (Spodoptera frugiperda (J.E. Smith)) is a highly polyphagous agricultural pest with long-distance migratory behavior threatening food security worldwide. This pest has a host range of > 80 plant species, but two host strains are recognized based on their association with corn (C-strain) or rice and smaller grasses (R-strain). The population genomics of the United States (USA) fall armyworm remains poorly characterized to date despite its agricultural threat.
In this study, the population structure and genetic diversity in 55 S. frugiperda samples from Argentina, Brazil, Kenya, Puerto Rico and USA were surveyed to further our understanding of whole genome nuclear diversity. Comparisons at the genomic level suggest a panmictic S. frugiperda population, with only a minor reduction in gene flow between the two overwintering populations in the continental USA, also corresponding to distinct host strains at the mitochondrial level. Two maternal lines were detected from analysis of mitochondrial genomes. We found members from the Eastern Hemisphere interspersed within both continental USA overwintering subpopulations, suggesting multiple individuals were likely introduced to Africa.
Our research is the largest diverse collection of United States S. frugiperda whole genome sequences characterized to date, covering eight continental states and a USA territory (Puerto Rico). The genomic resources presented provide foundational information to understand gene flow at the whole genome level among S. frugiperda populations. Based on the genomic similarities found between host strains and laboratory vs. field samples, our findings validate the experimental use of laboratory strains and the host strain differentiation based on mitochondria and sex-linked genetic markers extends to minor genome wide differences with some exceptions showing mixture between host strains is likely occurring in field populations.
Larvae of the fall armyworm, Spodoptera frugiperda (J.E. Smith, 1797) (Lepidoptera: Noctuidae), are a highly polyphagous agricultural pest affecting key food and fiber staples such as corn (Zea mays L.), cotton (Gossypium spp. L.), sorghum (Sorghum bicolor L.), rice (Oryza sativa L.), and vegetable crops . Damage by S. frugiperda in its native subtropical range in the Americas results in 15–100% yield loss, depending on the level of infestation . In most of the continental USA this pest does not diapause or survive cold winters. Overwintering populations in southern Texas and Florida migrate northward yearly over several generations to populate northern regions in USA and Canada .
For more than a decade, effective control of S. frugiperda in the Western hemisphere has been provided mainly by genetically modified corn and cotton producing insecticidal Cry and Vip3A proteins from the bacterium Bacillus thuringiensis (Bt). However, resistance to Cry1F, Cry1Ab and Cry1A.105 insecticidal proteins quickly developed in Puerto Rico and the continental USA (Florida and North Carolina) [4, 5], as well as Brazil  and Argentina . More recently, the economic importance of this pest has further increased with its introduction in sub-Saharan Africa , subsequent spread to India and Southeastern Asia , and more recently Australia . An estimate using data from twelve African countries indicates that yield losses resulting from S. frugiperda injury could be 21–53% of their annual corn production, which equals to US$2.5–$6.2 billion in losses . While yet to be detected in molecular screens , resistance alleles to Bt toxins could have been carried by invasive S. frugiperda to the Eastern hemisphere.
Populations of S. frugiperda are composed of sympatric mixtures of two genetically differentiated strains based on host preference, a “rice” (R) strain feeding on rice, millet and smaller grasses, and a “corn” (C) strain feeding preferentially on corn and sorghum [13, 14]. There is evidence supporting that this differentiation involves reproductive incompatibility  and differential susceptibility to xenobiotics . Signatures of positive selection for genes involved in chemoreception, detoxification, and digestion were also detected in whole-genome comparisons of nine C- and R-strain individuals . Being morphologically indistinguishable, C and R individuals are discriminated using genetic markers located on the mitochondrial COI (cytochrome oxidase subunit I) and sex-linked (Z chromosome) Tpi (triosephosphate isomerase) genes [18,19,20,21]. However, host preference is not absolute and discrepancies among haplotype markers have been reported. For example, the Tpi marker agrees with assortative mating and host assessments  in describing S. frugiperda samples from corn in the Eastern Hemisphere as C-strain, yet a predominant COI-R strain marker is obtained in these collections . These conflicting results may reflect interstrain hybridization  or be driven by maternally inherited symbionts that skew the distribution and diversity of certain haplotypes . Additional factors influencing these conflicting haplotyping results may include incomplete lineage sorting and plant host behavioral plasticity.
Nucleotides found at COI host strain marker sites have been commonly grouped into “haplotypes” with relative proportions associated to geographic origin , enabling differentiation of “Texas” and “Florida” overwintering S. frugiperda populations in the USA. Migratory studies with these haplotypes determined that the “Texas” population is distributed throughout South, Central and North America [25, 26], while the “Florida” population locates to Florida and the Caribbean but migrates through the eastern USA seaboard to reach Canada [27, 28]. Haplotype profiling of S. frugiperda from Africa, India and Southeastern Asia supports introductions in Africa of individuals from the “Florida” population and subsequent spread to Asia [9, 12].
In this work, we collected and sequenced 55 genomic DNA (gDNA) samples of S. frugiperda from three continents, with an emphasis on C-strain individuals from North and South America. Genome-wide comparisons allowed testing for gene flow between geographically distant S. frugiperda populations and comparing genetic diversity between field and laboratory-reared S. frugiperda. Based on Fst values there is limited genetic differentiation between Texas vs. Florida populations and samples from these locations display similar genetic nucleotide diversity. Host strain differentiation was supported by nuclear and mitochondrial genome differences. Comparison of mitochondrial genomes detected two clusters where the clusters correlate by and large with host strain and resistance to the Cry1F toxin. The exceptions to this correlation suggest that C-strain individuals based on nuclear genome have likely mated with R-strain individuals. We also observed no detectable reduction in genetic diversity in well-established S. frugiperda laboratory-reared colonies, which supports using laboratory strains of this pest as a model for field populations.
Host strain typing
Of the 55 S. frugiperda samples used in this study, almost half (26, 47.3%) originated from field collections representing eight states in the USA. The average mapping rate, defined as the proportion of sample sequencing reads mapping to the reference genome assembly, was 88.2% with a range of between 60.62 to 96.82%, according to SAMtools flagstat. Identification of the C- and R- host strains for each individual was performed using nuclear Tpi and mitochondrial COI genetic markers described elsewhere . Given that sample collections were from or around cornfields, we expected to observe a majority of C-strain samples. The COI1164 marker identified all samples as C-strain. In contrast, typing based on the nucleotides at the TPi183 site resulted in 42 samples identified as C-strain individuals (C/C), 8 as R-strain individuals (T/T), 3 as interstrain hybrids (C/T) and 2 that were undetermined due to low sequencing coverage at this locus. All eight identified R-strain samples were from the USA (3 from Texas, 2 from Maryland and Tennessee each, and 1 from Puerto Rico).
Whole genome diversity, nucleotide diversity and differentiation
The FastStructure tool was used first to determine possible population structure among the samples. The number of groups (k) was analyzed for k = 1 to k = 10, but all samples clustered as one group even when the number of groups was increased. This lack of clear population structure was further confirmed using principal component analysis (PCA) on a filtered set of 2,762,958 nuclear SNPs (see Methods). PC1 through PC10 was plotted and no structure was found based on country of origin, host strain or susceptibility to the Cry1F insecticidal protein from Bt (data not shown for PC2 to PC10). We noted that PCA using the first two principal components uncovered weak substructure within the USA and Brazil samples where the majority of samples clustered in a larger cluster and two subclusters formed. The first subcluster was comprised of Brazil, Florida, Texas and Tennessee samples (BraMGr1, BraMGr3, BraBAr1, USAFLu1, USAFlu3, USAFLu6, USAFLu10, USAFLu12, USATxu6, USATNu1). The second subcluster included five samples, three from Florida (USAFLu9, USAFLu11, and USAFlr2) and two from Brazil (BraMGr2, BraBAr2) (Fig. 1).
Given the overall lack of clear population structure observed based on origin, we next applied genome sketching  and traditional distance-based hierarchical clustering as a third strategy to analyze these samples. The hierarchical cluster dendrogram defined by the genome-wide distance estimates based on the fraction of shared k-mers between each pair of genomes was cut at height 0.05 as it resulted in at least two primary clusters (Texas vs. non-Texas) among the samples analyzed (Fig. 2). This cut height generated 8 clusters, 6 containing less than 3 samples and were merged to the major cluster, while the remaining 10 samples were left in the remaining 2 minor cluster. The first cluster was comprised of samples from the USA (USATXu2–5 from Texas, USATNu1–2 from Tennessee, USAMDu1–2 from Maryland, USAFlu2 from Florida and PueSlu1 from Puerto Rico), while the second cluster included individuals from all countries sampled with the majority of samples from USA representing Florida/eastern seaboard. The three samples from Puerto Rico were interspersed across both clusters and always clustered with Florida samples, yet genomic distances between clusters were small (< 0.05 Mash distance). Similar to the PCA clusters, these two clusters did not correlate with host strain designation or geographic origin, with the first cluster including both R- and C-strain samples and the second cluster containing both R- and C-strain samples as well as three interstrain hybrids (Supplemental Fig. S1).
Mash distance is a proxy for genome pairwise distance, with 0 representing identical genomes (all k-mers that comprise the sketches are present in both sequences). The Mash distances across all countries surveyed were averaged to be < 0.05 (Table S2, range of average distance was 0.043 to 0.045), probably reflecting recent population expansion. Moreover, these Mash distances support that R and C-strains are members of the same species, as the estimated genetic variation is comparable to at least 95% average nucleotide identity (ANI), which is the ANI cutoff for eukaryotic species .
Gene flow between overwintering populations in the USA
Previous work based on the ratios of four pre-determined COI haplotypes suggested that the overwintering Florida S. frugiperda population is mostly reproductively isolated from the overwintering population in Texas, with genetic exchange mostly occurring at the north and south ends of the Appalachian mountain chain . As mentioned above, we observed that C-strain samples from Texas and Florida/eastern seaboard did not separate based on hierarchical clustering of genome-wide distances, and Florida samples were found across both groups (Fig. 2). Further, when comparing the two R-strain samples from Texas and the twelve C-strain samples from Florida/eastern seaboard, we detected a low pairwise weighted Fst value (0.0115) based on 2,726,958 SNPs. This very low Fst value suggests no population separation at the nuclear level, supporting recent population separation or ongoing gene flow between Texas and Florida overwintering populations, independently of host strain.
Based on the same Texas R-strain and Florida C-strain groupings, mean diversity was determined. Mean diversity for the field samples from Texas R-strain and Florida C-strain samples was estimated at 0.0016 and 0.0013, respectively (Fig. 3). Next, we determined the Fst value based on 406 mitochondrial SNPs found across all samples in these groups. The mitochondrial Fst for Texas R-strain and Florida C-strain samples was estimated at 0.7961. This Fst estimate being close to 1 suggests genetic differentiation between the R and C strains is the strongest at the mitochondrial level, possibly due to two main ancestral lineages. Although we see a stronger difference between geographic regions than between R and C strains in this study, the estimated mitochondrial to nuclear Fst ratio of 46.8 is consistent with previously reported sex-based differences in dispersal  and/or mating preference differences between strains . Even so, our data suggest hybridization occurs and some C-strain nuclear genomes have a clearly R-strain associated mitochondrial genome (see Fig. 5).
Laboratory vs. field diversity
Laboratory-reared strains of S. frugiperda are commonly used for diverse physiological, toxicological and genetic studies, yet their genetic resemblance to field populations has not been investigated. To investigate the differentiation level between two well established laboratory strains (> 20 years) originating from the same location in the USA (Mississippi, MS) and all field USA populations surveyed, we estimated a weighted Fst (fixation index). As shown in Table 1, there was moderate differentiation between the MS laboratory strains and field USA field populations, with an Fst range of 0.0310 to 0.0961. Moderate differentiation was also observed when comparing the MS laboratory strains with other more recently established (< 10 years) laboratory populations from North Carolina (NC) and Florida (FL), with an Fst range of 0.0131to 0.1302. The samples least differentiated (Fst = 0.0131) from the MS laboratory populations were FL laboratory, while the most distinct (Fst = 0.0961) were the samples collected from Maryland (MD). Overall, these moderate Fst values  suggest small differences at the allelic level between laboratory and field-collected groups. Genetic diversity in the two laboratory MS colonies was slightly different but comparable to diversity in field USA populations (Fst = 0.0761 on average) and to either of the two overwintering populations (Fst = 0.0398).
Additionally, we surveyed overall genomic diversity using 3 kbp windows and estimated mean diversity for the same two MS laboratory strains compared to all field and laboratory S. frugiperda samples surveyed. Mean nuclear diversity was 0.0013 for Benzon (USAMSs1) and SIMRU (USAMSs2) laboratory strains (n = 2), compared to 0.0016 for field Texas samples (n = 8), and 0.0014 for field Florida/eastern USA samples (n = 16).
Diversity across the mitochondrial genome
We investigated the maternal lineage among all but one of the S. frugiperda samples (due to lack of a complete mitochondrial genome in sample BraBAr4) by comparing genetic diversity of whole mitochondrial (mt) genomes. Using a total of 390 SNPs, we generated a haplotype network as described in Materials and Methods (Fig. 4). Consistent with a large amount of standing variation in the species, the mitochondrial sequences appeared unique except for 22 samples forming ten haplogroups (A to J) with more than one sample (Fig. 4). Two of these haplogroups contained only samples from USA, haplogroup A formed by samples of the same laboratory S. frugiperda strain (USAMSs1 and USAMSs2), and haplogroup B containing USAFLu5 and USAFLu9. Clusters with samples only from Brazil included haplogroups C, D and E containing samples from Cry1F-resistant strains from Bahia (BraBAr2, BraBAr3 and BRABAr4), Sao Paulo (BraSPr1 and BraSPr2), and Minas Gerais (BraMGr1 and BraMGr2), respectively. Samples from Kenya clustered in haplogroup F (KenXXu3 and USAMAu2) and G (KenXXu1 and KenXXu2). Non-unique haplotype samples from Puerto Rico clustered in haplogroups H (PueGUr1 and USAFLu3) and I (PueLAu1 and USAFLu7) with samples from Florida. Haplogroup J contained two samples (BraMGr4 and BraMGr5) from the Cry1F-resistant strain from Minas Gerais also represented in haplogroup E, and one sample from USA (USAFLu6). There were two main “haplotype” clusters that separated in the network, with average genetic difference between the two clusters of 29.53, supporting evidence for two main maternal lineages within the samples surveyed. Using these two haplotype clusters as labels, we mapped the mt haplotype clusters to the two whole genome sequence (WGS) “haplotype” clusters found in Mash distance-based tree against a mitochondrial variant UPGMA distance-based tree (Fig. 5). In accordance with the previously estimated Fst, the mt and WGS clusters did not correlate completely. However, all samples in one of the mt haplotypes clustered in one of the WGS haplotypes (“one” in purple in Fig. 5), while the other mt haplotype contained a mix of WGS haplotypes. While not all samples were phenotyped, all samples classified as resistant to the Cry1F toxin clustered in the mt haplotype group with majority of corn strain samples.
We report on the genome-level comparison of 55 S. frugiperda samples from diverse locations, mostly from the native range in the Americas but also including samples from an African location (Kenya) where the insect is a devastating exotic pest. While using the C-strain S. frugiperda genome for alignments could have underestimated diversity in R-strain individuals, this approach is supported by previous surveys using both corn and rice genomes with minimal differences found based on the number of variants called and Fst values .
The population structure of the 55 genomic S. frugiperda samples was surveyed using parametric and non-parametric methods. Based on FastStructure (parametric), no population structure was found either within samples from the Eastern Hemisphere or between the Eastern and Western Hemisphere samples. Since methods based on parametric estimation of allele frequencies methods are sensitive to sample size, one possibility is that the lack of structure detected is a reflection of the relatively small number of R-strain samples (only 8 individuals and potentially 2 hybrids) analyzed. A more even distribution would address potential limited sampling of some locations surveyed, especially samples representing the Eastern Hemisphere (e.g. Africa). However, results from a previous study of genetic S. frugiperda diversity in the Western Hemisphere using amplified fragment length polymorphism (AFLP) already supported a large interbreeding population .
Both PCA and FastStrucutre analyses found little to no population structure based on the nuclear genome. Furthermore, the Mash distance-based hierarchical clusters showed weak correlation between nuclear genome and host strain. In contrast, the Hamming Distance tree based on the mitochondrial genome detected two subpopulations within the samples surveyed, which were associated with host strain and the southern Texas and Florida subpopulations previously described based on four mitochondrial COI haplotypes . Further, we identified individuals that appeared to have a C-strain nuclear genome with an R-strain associated mitochondria (maternal lineage). This discrepancy is consistent with conflicting host strain assignments obtained using specific genetic markers between R-strain and interstrain hybrids, as previously reported . The lack of consistent clustering based on host strain at the nuclear genome level and the weak correlation between mitochondrial haplotype and nuclear genome groupings suggest that either there is some level of hybridization between host strains or that the current locus markers may not be sufficient to discriminate host strains. Although previous research found the groupings based on a neighbor joining tree of nuclear genome SNPs to correlate with host strain (9 corn and 9 rice S. frugiperda samples from Mississippi) , that finding could be affected by limited sample size. Low amount of genetic variation across the whole nuclear genome between the C- and R-strains is suggestive of genome differentiation after the ecological differentiation of host strains and that reproductive and behavioral barriers are isolating C- and R-strains . In testing this hypothesis, future research should involve analyses of a larger sample size of C- and R-strain individuals from Texas and Florida.
Our analysis included samples collected from locations (Maryland & Tennessee) close to predicted hybridization zones for TX and FL overwintering S. frugiperda populations , and as expected these associated with both geographic regions. There were also three exceptions to the geographic separation: two Texas samples that were found in the predominately Florida represented group and one Florida sample was found in the predominately Texas population. This observation suggests that the reproductive isolation between Texas and Florida/Eastern seaboard is not absolute and mating may be occurring in overlapping locations, as previously suggested . Future work should include increased sampling from potential hybrid zones (Georgia/New York) and Texas. The Mash distance-based clusters suggested that Puerto Rico samples are genetically closer to Florida, since they were found in sub-clusters with Florida-based samples. However, due to low representation and small genomic distances between each cluster, our data is not able to show conclusive support for migration models suggesting significant exchanges between S. frugiperda from Puerto Rico and Florida .
Mitochondrial Fst estimates obtained between C- and R- host strains (0.7961) were highly similar to the Fst value (0.938) from a prior analysis of 9 corn and 9 rice sympatric samples from Mississippi (USA) . Moreover, the nuclear DNA Fst found in our study comparing all corn versus rice strains (0.0173) is also similar to the previous study’s estimate (0.019) . The two mitochondrial haplotypes found in our analysis could be explained by selective mating or some form of reproductive incompatibility. Despite the fact that susceptibility to Cry1F was not known for all samples, we observed that all Cry1F-resistant strains clustered to one of the mitochondrial haplotypes, despite their geographic origin. Although speculative given the comparatively low number of confirmed Cry1F resistant samples, this grouping may reflect partitioning of resistance with the haplotype group only containing C-strain samples. Further work including Cry1F-resistant R-strain samples would be necessary to test for associations between mitochondrial haplotype and resistance to Cry1F.
One possibility to explain the mitochondrial haplotypes detected could be that they are based on reproductive incompatibility from infection by parasites such as Wolbachia . However, we only detected short fragments (< 100 bp) matching to the RefSeq Wolbachia genomes on NCBI as of May 9, 2020 using BLAST (data not shown). This observation is in agreement with leg and head tissues being mostly used for isolating genetic material. In testing for Wolbachia infection, we extracted genomic DNA from a limited number of available abdominal samples and performed sequencing of PCR amplicons using primers amplifying the outer surface protein precursor (wsp) Wolbachia gene. Amplicons of the same size (Fig. S2) and containing exactly the same sequence (Fig. S3), were detected in all samples tested independently of mitochondrial haplotype. These observations support no influence of Wolbachia infection on reproductive isolation of clustered populations. However, due to lack of detailed data on Wolbachia strains infecting S. frugiperda, a more detailed characterization would be needed to conclusively rule out the possibility of incompatibility. Further, lack of clear correspondence between genome-based and mitochondria-based haplotypes suggest on-going hybridization between C- and R-strains, consistent with detection of two potential hybrids among our 54 samples. Consistent lack of concordance between the nuclear and mitochondrial Fst values suggests that biological mitochondria-nuclear interactions are not maintained, further supporting that the samples analyzed are part of a panmictic population with little to no population structure.
Previous reports providing comparisons of genetic diversity in laboratory-reared and wild populations of the lepidopteran Plutella xylostella , Tephritid fruit flies , and locusts  support loss of allele number and heterozygosity during adaptation to mass rearing. In contrast, in Drosophilid fruit flies  and in long-established laboratory lines of Anastrepha fraterculus (Tephritidae)  high genetic variability remained. We compared diversity in two long-established (> 20 years) S. frugiperda laboratory lines. Both laboratory lines, Benzon (USAMSs1) and SIMRU (USAMSs2), originated from field collections in Mississippi and they were the least differentiated from the TX/MN field collections. Interestingly, more recently established laboratory populations from NC and FL  presented lower Fst values compared to field samples. This moderate differentiation between laboratory and field samples could be indicative of good culture management involving maintenance of large populations and/or regular introgression of field individuals. Another possibility could be that some of the field samples compared display low diversity, yet this is unlikely given samples from Central and South America (for Texas population) and the Caribbean (for Florida populations) based on current S. frugiperda migratory models . It is important to consider that despite the lack of overall differentiation between laboratory and field S. frugiperda, differentiation probably occurs during adaptation to mass rearing, resulting in selection of specific alleles that may not be representative of field populations. Thus, while our observations suggest laboratory strains contain genetic diversity similar to field populations, laboratory strains may have specific allele frequencies that may not be representative of field populations.
Overall, this study provides a diverse characterization of S. frugiperda whole and mitochondrial genome diversity. Findings from the study support lack of clear S. frugiperda population structure among the locations sampled. Only mitochondrial genomes indicated two different maternal lines mostly separating host strains. The genomic resources generated allow further exploration of gene flow and how it may affect management of S. frugiperda as an expanding global superpest using laboratory-reared and field-collected individuals.
Samples and strain typing
Details of the 55 laboratory and field-collected S. frugiperda samples sequenced are found in Supplementary Table 1 and Fig. S4. Adults (moths) were captured using sex pheromone baited traps , with most collection sites near corn plantings to optimize trap capture efficiency of C-strain males. Laboratory-reared samples were obtained from rearing facilities. Susceptibility to Cry1F toxin from B. thuringiensis for these laboratory samples was tested in bioassays presented elsewhere [5, 44,45,46]. Larval samples were collected at field and laboratory rearing locations and used when available. The collected specimens were identified as S. frugiperda by morphology features  and stored at − 20 °C until required for analysis.
We followed a sample naming protocol that included the first three letters representing the country of origin (Bra for Brazil, Arg for Argentina, USA for United States of America, Ken for Kenya, and Pue for Puerto Rico), the next two letters representing the first two letters of the state/province of origin (SP for Sao Paulo, TX for Texas, etc. XX when unknown), and the third letter representing the Cry1Fa susceptibility phenotype when known (“r” for resistant, “s” for susceptible and “u” for unknown). A number was used when necessary to differentiate samples with the same geographic origin and Cry1F susceptibility phenotype.
Host strain (C versus R) was determined based on sequence identity at specific marker positions in reference mtCoI1164 and Tpi183 sequences, as described elsewhere . The mitochondrial COI1287 marker identified all samples as C-strain, so the nuclear Tpi183 marker was used as a more reliable marker of host strain . The mitochondrial COI1164 marker was not used because it did not fall under any of the pre-defined COI haplotypes  for any of the samples.
DNA extraction, library preparation, and sequencing
Genomic DNA was isolated from individual carcasses of fifth instar S. frugiperda larvae after dissecting the gut tissue or from legs or heads of adults using the Pure Link Genomic DNA mini kit (Invitrogen), following manufacturer’s protocols, and then quantified using a Nanodrop spectrophotometer (Thermo Scientific). Extracted DNA was then sheared randomly to between 250 and 500 bp using a Covaris M220 focused ultrasonicator (Woburn, MA), according to manufacturer’s instructions. The fragmented DNA was then ligated with dual-indices using a KAPA Hyper prep PCR-free library kit (Roche) according to the manufacturer’s directions. The ligated fragments were quantified using quantitative PCR and a KAPA Library quantification kit (Roche), and then submitted for sequencing on an Illumina HiSeqX device running a 2 × 150 bp paired-end configuration (Admera Health, NJ). Raw paired-end sequence reads for each sample (55) are available for download in the NCBI Sequence Read Archive (SRA) under SRR12044614-SRR12044668 with associated metadata available under NCBI BioProject id PRJNA640063. The raw data was processed to remove low quality reads using the CLC Genomics Workbench v9.5.2 (Qiagen) trim function with default parameters.
Filtering and mapping
Raw reads were quality-trimmed at both ends, filtered for adapter sequences and error corrected for known Illumina artifacts and PhiX sequencing control using BBDuk (https://jgi.doe.gov/data-and-tools/bbtools/) with options trimq = 15 and filter = 23. Further quality control and confirmation that adapter and non-relevant (contaminants, primer artifacts) sequences were removed from filtered reads were performed using FastQC . Given the prevalence of C-strain among our samples and similar number of variants called when using both S. frugiperda corn and rice genomes , we mapped the remaining reads to the S. frugiperda corn reference genome (v3.1) downloaded from https://bipaa.genouest.org/sp/spodoptera_frugiperda_pub/ (assembled length of 312 Mb across 29,949 scaffolds ) using a Burrows-Wheeler aligner algorithm (bwa-mem) . The average mapping rate for all samples was determined by averaging the percentage of mapped alignments in the BAM file per sample to the reference genome, as determined by SAMtools flagstat .
Variants were then called using the SAMtools mpileup utility , resulting in detection of 126,977,977 single nucleotide polymorphisms (SNPs) and indels. The unfiltered dataset contained 120,398,863 SNPs. Three samples (USATXu1, BraSPr2,BraSPr1 and BraMGr5) were removed from SNP analysis because they had missing variant call rate greater than 50%, resulting in 51 samples. Further filtering was applied using vcftools  to filter out SNPs only, minor allele frequency being less than or equal to 0.05, genotype present in at least 50% of the 51 samples surveyed and PLINK2  filtering only 2 distinct alleles, removing mismatched alleles, zero observations at allele, and missing variant ID. This resulted in final filtered SNP dataset of 2,762,958. This filtered dataset was used on all nuclear SNP analyes: PCA, nucleotide diversity (pi) and Fst estimates.
The Mash distance method was used to measure pairwise dissimilarity of genomic sequences . One key advantage of sketching-based approaches is that they neither require de novo genome assembly nor a pre-existing reference genome to identify related individuals. The Mash method estimates sequence similarity via the Jaccard similarity coefficient (the ratio of shared k-mers to total k-mers) for a “random” subset of k-mer pairs for each genome surveyed. The Jaccard similarity (J) is then used to calculate dissimilarity between two genomes (D) as D = 1-J. All S. frugiperda individuals were sketched with Mash  using a 10,000 sketch size, k-mer size of 21 and minimum copy of each k-mer equal to 2, per previous recommendations of representing genomes with minimal computational costs . All sample sketches were screened against the Mash Refseq database and other Spodoptera species (Spodoptera exigua, GCA_011316535.1; and Spodoptera litura, GCA_002706865.2) as outgroups.
Weighted Fst values as indicators of total genetic variance in a subpopulation were calculated using VCFtools  by the Weir and Cockerham estimator on 1 kbp, 3 kbp and 5 kbp windows. The variation in window size was used to confirm that there was little to no change in Fst when the window was varied. For Fst calculations using the mitochondrial genome, each base pair position was compared. Nuclear genome (pi) values were calculated using 1 kbp windows when comparing corn vs. rice host strain, and 3 kbp when comparing Fst values for overwintering FL/TX vs. USA laboratory samples. We also used 3 kbp windows for Fst comparisons across the nuclear genome with populations defined by mitochondria haplotype network groupings, Mash genome defined “haplotypes”, and host strain haplotypes. For all nucleotide diversity analyses, a total of 2,762,958 filtered SNPs as described in the Filtering and Mapping section was used to assess diversity using VCFtools .
Mitochondrial genome assembly
Mitochondrial sequences were extracted from the whole genome sequence reads using NOVOplasty  with the mitochondrial partial COI sequence MH932092.1 as the seed. A continuous complete mitochondrial chromosome of around 15 kbp was generated for all 55 samples, except for BraBAr5, which was excluded from further analysis. The whole mitochondrial FASTA sequences were assembled for each sample and then aligned via bwa-mem (https://github.com/lh3/bwa) with default options. Mitochondrial variants were called using SAMtools mpileup based on the S. frugiperda corn reference genome at LepidoDB (https://bipaa.genouest.org/sp/spodoptera_frugiperda_pub/). The vcf files were compressed using SAMtools bgzip, indexed using SAMtools tabix and merged using SAMtools BCFtools merge . Missing genotypes were replaced with sequence at the mitochondrial reference genome from the S. frugiperda corn genome at LepidoDB. The vcf files were then filtered by removing 16 multi-allelic sites and 92 uninformative sites due to being found in 2 samples or less via the informloci function in the adegenet R package , resulting in 298 sites remaining. These remaining mitochondrial sites were used to build a haplotype network using the R package pegas haploNet function  with Euclidean distance and an infinite site model used for building the network.
PLINK –make-bed  was used to generate the input file to FastStructure , a variation of Structure made for larger SNP datasets. FastStructure is a generative model-based approach based on Hardy-Weinberg equilibrium assumptions between alleles and linkage disequilibrium between genotyped loci. The FastStructure script structure.py was used to determine k, the number of assumed populations or genetic groups that share a subset of allele frequencies from k = 2 to 10. Then k = 2 was chosen by the FastStructure’s choosek.py script for model complexity that maximizes marginal likelihood, and k = 1 for model components to explain structure. Additionally, PCA eigenvalues and eigenvectors were generated using PLINK 2.0 [52, 56] and visualized in R using ggplot2 .
Availability of data and materials
Sequence data presented in this study are available for download at the NCBI Sequence Read Archive (SRA) under SRR12044614-SRR12044668 with associated metadata available under NCBI BioProject id PRJNA640063.
Single nucleotide polymorphism
Montezano DG, Specht A, Sosa-Gómez D, Roque-Specht V, Sousa-Silva J, Paula-Moraes SV, Peterson J, Hunt T. Host plants of Spodoptera frugiperda (Lepidoptera: Noctuidae) in the Americas. Afr Entomol. 2018;26(2):286–300.
Hruska AJ, Gould F. Fall armyworm (Lepidoptera: Noctuidae) and Diatraea lineolata (Lepidoptera: Pyralidae): impact of larval population level and temporal occurrence on maize yield in Nicaragua. J Econ Entomol. 1997;90(2):611–22.
Westbrook J, Fleischer S, Jairam S, Meagher R, Nagoshi R. Multigenerational migration of fall armyworm, a pest insect. Ecosphere. 2019;10(11):e02919.
Storer NP, Babcock JM, Schlenz M, Meade T, Thompson GD, Bing JW, Huckaba RM. Discovery and characterization of field resistance to Bt maize: Spodoptera frugiperda (Lepidoptera: Noctuidae) in Puerto Rico. J Econ Entomol. 2010;103(4):1031–8.
Huang F, Qureshi JA, Meagher RL, Reisig DD, Head GP, Andow DA, Ni X, Kerns D, Buntin GD, Niu Y, et al. Cry1F resistance in fall armyworm Spodoptera frugiperda: single gene versus pyramided Bt maize. PLoS One. 2014;9(11):e112958.
Farias JR, Andow DA, Horikoshi RJ, Sorgatto RJ, Fresia P, dos Santos AC, Omoto C. Field-evolved resistance to Cry1F maize by Spodoptera frugiperda (Lepidoptera: Noctuidae) in Brazil. Crop Protect. 2014;64:150–8.
Chandrasena DI, Signorini AM, Abratti G, Storer NP, Olaciregui ML, Alves AP, Pilcher CD. Characterization of field-evolved resistance to Bacillus thuringiensis-derived Cry1F delta-endotoxin in Spodoptera frugiperda populations from Argentina. Pest Manag Sci. 2018;74(3):746–54.
Goergen G, Kumar PL, Sankung SB, Togola A, Tamò M. First report of outbreaks of the fall armyworm Spodoptera frugiperda (J E smith) (Lepidoptera, Noctuidae), a new alien invasive pest in west and Central Africa. PLoS One. 2016;11(10):e0165632.
Nagoshi RN, Htain NN, Boughton D, Zhang L, Xiao Y, Nagoshi BY, Mota-Sanchez D. Southeastern Asia fall armyworms are closely related to populations in Africa and India, consistent with common origin and recent migration. Sci Rep. 2020;10(1):1421.
Department of Agriculture and Fisheries, Queensland Government, Australia. First mainland detection of fall armyworm [https://www.daf.qld.gov.au/news-media/media-centre/biosecurity/news/first-mainland-detection-of-fall-armyworm].
Day R, Abrahams P, Bateman M, Beale T, Clottey V, Cock M, Colmenarez Y, Corniani N, Early R, Godwin J, et al. Fall armyworm: impacts and implications for Africa. Outlooks Pest Manag. 2017;28(5):196–201.
Nagoshi RN, Koffi D, Agboka K, Tounou KA, Banerjee R, Jurat-Fuentes JL, Meagher RL. Comparative molecular analyses of invasive fall armyworm in Togo reveal strong similarities to populations from the eastern United States and the greater Antilles. PLoS One. 2017;12(7):e0181982.
Pashley DP. Host-associated genetic differentiation in fall armyworm (Lepidoptera: Noctuidae): a sibling species complex? Ann Entomol Soc Am. 1986;79:898–904.
Pashley DP. Quantitative genetics, development, and physiological adaptation in host strains of fall armyworm. Evolution. 1988;42(1):93–102.
Pashley DP, Martin JA. Reproductive incompatibility between host strains of the fall armyworm (Lepidoptera: Noctuidae). Ann Entomol Soc Am. 1987;80:731–3.
Hay-Roe MM, Meagher RL, Nagoshi RN. Effects of cyanogenic plants on fitness in two host strains of the fall armyworm (Spodoptera frugiperda). J Chem Ecol. 2011;37(12):1314–22.
Gouin A, Bretaudeau A, Nam K, Gimenez S, Aury J-M, Duvic B, Hilliou F, Durand N, Montagné N, Darboux I, et al. Two genomes of highly polyphagous lepidopteran pests (Spodoptera frugiperda, Noctuidae) with different host-plant ranges. Sci Rep. 2017;7(1):11816.
Levy H, Garcia-Maruniak A, Maruniak J. Strain identification of Spodoptera frugiperda (Lepidoptera: Noctuidae) insects and cell line; PCR-RFLP of cytochrome oxidase C subunit 1 gene. Fla Entomol. 2002;85(1):186.
Nagoshi RN, Meagher RL, Adamczyk JJ Jr, Braman SK, Brandenburg RL, Nuessly G. New restriction fragment length polymorphisms in the cytochrome oxidase I gene facilitate host strain identification of fall armyworm (Lepidoptera: Noctuidae) populations in the southeastern United States. J Econ Entomol. 2006;99(3):671–7.
Nagoshi RN, Meagher RL. Using intron sequence comparisons in the triose-phosphate isomerase gene to study the divergence of the fall armyworm host strains. Insect Mol Biol. 2016;25(3):324–37.
Nagoshi RN. The fall armyworm triose phosphate isomerase (Tpi) gene as a marker of strain identity and interstrain mating. Ann Entomol Soc Am. 2010;103(2):283–92.
Nagoshi RN. Evidence that a major subpopulation of fall armyworm found in the Western hemisphere is rare or absent in Africa, which may limit the range of crops at risk of infestation. PLoS One. 2019;14(4):e0208966.
Cook James M, Delgado AM. Effects of a sex-ratio distorting endosymbiont on mtDNA variation in a global insect pest. BMC Evol Biol. 2009;9(1):49.
Nagoshi RN, Silvie P, Meagher RL. Comparison of haplotype frequencies differentiate fall armyworm (Lepidoptera: Noctuidae) corn-strain populations from Florida and Brazil. J Econ Entomol. 2007;100(3):954–61.
Nagoshi RN, Fleischer S, Meagher RL, Hay-Roe M, Khan A, Murua MG, Silvie P, Vergara C, Westbrook J. Fall armyworm migration across the Lesser Antilles and the potential for genetic exchanges between north and south American populations. PLoS One. 2017;12(2):e0171743.
Nagoshi RN, Nagoshi BY, Canarte E, Navarrete B, Solorzano R, Garces-Carrera S. Genetic characterization of fall armyworm (Spodoptera frugiperda) in Ecuador and comparisons with regional populations identify likely migratory relationships. PLoS One. 2019;14(9):e0222332.
Nagoshi RN, Meagher RL, Hay-Roe M. Inferring the annual migration patterns of fall armyworm (Lepidoptera: Noctuidae) in the United States from mitochondrial haplotypes. Ecol Evol. 2012;2(7):1458–67.
Nagoshi RN, Rosas-Garcia NM, Meagher RL, Fleischer SJ, Westbrook JK, Sappington TW, Hay-Roe M, Thomas JM, Murua GM. Haplotype profile comparisons between Spodoptera frugiperda (Lepidoptera: Noctuidae) populations from Mexico with those from Puerto Rico, South America, and the United States and their implications to migratory behavior. J Econ Entomol. 2015;108(1):135–44.
Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016;17(1):132.
Rašić G, Schama R, Powell R, Maciel-de Freitas R, Endersby-Harshman NM, Filipović I, Sylvestre G, Máspero RC, Hoffmann AA. Contrasting genetic structure between mitochondrial and nuclear markers in the dengue fever mosquito from Rio de Janeiro: implications for vector control. Evol Appl. 2015;8(9):901–15.
Finley LWS, Haigis MC. The coordination of nuclear and mitochondrial communication during aging and calorie restriction. Ageing Res Rev. 2009;8(3):173–88.
Hartl DL, Clark AG. Principles of population genetics. 4th ed. Sunderland: Sinauer Associates; 2007.
Nam K, Nhim S, Robin S, Bretaudeau A, Nègre N, d’Alençon E. Postitive selection alone is sufficient for whole genome differentiation at the early stage of speciation process in the fall armyworm. BMC Evol Biol. 2020;20:152–68.
Clark PL, Molina-Ochoa J, Martinelli S, Skoda SR, Isenhour DJ, Lee DJ, Krumm JT, Foster JE. Population variation of the fall armyworm, Spodoptera frugiperda, in the Western hemisphere. J Insect Sci. 2007;7:5.
Nagoshi RN, Meagher RL, Flanders K, Gore J, Jackson R, Lopez J, Armstrong JS, Buntin GD, Sansone C, Leonard BR. Using haplotypes to monitor the migration of fall armyworm (Lepidoptera: Noctuidae) corn-strain populations from Texas and Florida. J Econ Entomol. 2008;101(3):742–9.
Groot AT, Marr M, Heckel DG, SchÖfl G. The roles and interactions of reproductive isolation mechanisms in fall armyworm (Lepidoptera: Noctuidae) host strains. Ecol Entomol. 2010;35:105–18.
Correa CC, Ballard JWO. Wolbachia associations with insects: winning or losing against a master manipulator. Front Ecol Evol. 2016;3.
Perry KD, Keller MA, Baxter SW. Genome-wide analysis of diamondback moth, Plutella xylostella L., from Brassica crops and wild host plants reveals no genetic structure in Australia. Sci Rep. 2020;10:12047.
Gilchrist AS, Cameron EC, Sved JA, Meats AW. Genetic consequences of domestication and mass rearing of pest fruit fly Bactrocera tryoni (Diptera: Tephritidae). J Econ Entomol. 2012;105(3):1051–6.
Berthier K, Chapuis MP, Simpson SJ, Ferenz HJ, Habib Kane CM, Kang L, Lange A, Ott SR, Babah Ebbe MA, Rodenburg KW, et al. Laboratory populations as a resource for understanding the relationship between genotypes and phenotypes: a global case study in locusts. Adv Insect Physiol. 2010;39:1–37.
Simões P, Pascual M, Coelho MM, Matos M. Divergent evolution of molecular markers during laboratory adaptation in Drosophila subobscura. Genetica. 2010;138(9–10):999–1009.
Parreño MA, Scannapieco AC, Remis MI, Juri M, Vera MT, Segura DF, Cladera JL, Lanzavecchia SB. Dynamics of genetic variability in Anastrepha fraterculus (Diptera: Tephritidae) during adaptation to laboratory rearing conditions. BMC Genet. 2014;15(Suppl 2):S14.
Meagher RL. Trapping fall armyworm (Lepidoptera: Noctuidae) adults in traps baited with pheromone and a synthetic floral volatile compound. Fla Entomol. 2001;84(2):288–92.
Banerjee R, Hasler J, Meagher R, Nagoshi R, Hietala L, Huang F, Narva K, Jurat-Fuentes JL. Mechanism and DNA-based detection of field-evolved resistance to transgenic Bt corn in fall armyworm (Spodoptera frugiperda). Sci Rep. 2017;7(1):10877.
Santos-Amaya OF, Rodrigues JVC, Souza TC, Tavares CS, Campos SO, Guedes RNC, Pereira EJG. Resistance to dual-gene Bt maize in Spodoptera frugiperda: selection, inheritance, and cross-resistance to other transgenic events. Sci Rep. 2015;5(1):18243.
Santos-Amaya OF, Tavares CS, Rodrigues JVC, Campos SO, Guedes RNC, Alves AP, Pereira EJG. Fitness costs and stability of Cry1Fa resistance in Brazilian populations of Spodoptera frugiperda. Pest Manag Sci. 2017;73(1):35–43.
Fall armyworm. University of Florida Publication Number EENY-98. http://entnemdept.ufl.edu/creatures/field/fall_armyworm.htm#host
Andrews S. FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics. 2009;25:1754–60.
Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27(21):2987–93.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Human Gen. 2007;81:559–75.
Dierckxsens N, Mardulyn P, Smits G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 2017;45(4):e18.
Jombart T, Ahmed I. Adegenet 1.3-1: new tools for the analysis of genome-wide SNP data. Bioinformatics. 2011;27(21):3070–1.
Paradis E. Pegas: an R package for population genetics with an integrated–modular approach. Bioinformatics. 2010;26(3):419–20.
Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7.
Pritchard JK, Stephens M, Raj A. fastSTRUCTURE: Variational inference of population structure in large SNP data sets. Genetics. 2014;197(2):573–89.
Wickham H. ggplot2 elegant graphics for data analysis. New York: Springer-Verlag; 2016.
We would like to thank Rodney Nagoshi for review and helpful comments.
This project was funded by Agriculture and Food Research Initiative Foundational Program competitive grant 2018–67013-27820 and Hatch Multistate NC-246 from the US Department of Agriculture National Institute of Food and Agriculture.
Ethics approval and consent to participate
Consent for publication
The authors have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Describing details of samples analyzed, and Table 2 showing Mash distances based on country. Supplementary Figure S1. Showing clustering based on Mash distances, Figure S2 showing amplification of Wolbachia in selected samples, Figure S3 showing the amplified sequences in Fig. S2, and Figure S4 showing the geographic locations sampled.
About this article
Cite this article
Schlum, K.A., Lamour, K., de Bortoli, C.P. et al. Whole genome comparisons reveal panmixia among fall armyworm (Spodoptera frugiperda) from diverse locations. BMC Genomics 22, 179 (2021). https://0-doi-org.brum.beds.ac.uk/10.1186/s12864-021-07492-7
- Fall armyworm
- Spodoptera fruigperda
- Genome wide diversity
- Corn/rice host strains