Skip to main content
Fig. 2 | BMC Genomics

Fig. 2

From: An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data

Fig. 2

Effect of k-mer length on k-mer homoplasy. a Mathematical predictions of the proportion of shared k-mers, p h , as a function of k for genomes of sizes g = 105 (blue), 107 (purple), and 109 (green) when the true genetic distance between two species is d = 0.02 or d = 0.1, and the GC content is 0.5 (solid lines) and 0.4 (dashed lines). The dashed black line gives the hypothetical case if there were no k-mer homoplasy. Calculations were performed using the assumption that genomes are random sequences (Eq. 5). b Simulations of the effect of k-mer homoplasy on n s /n t and comparison with its theoretical prediction p h . Three simulations were performed starting with a random sequence of 105 bp assuming that the true genetic distance between taxa is d = 0.1. The black lines give n s /n t from sequence simulations and the blue lines give the theoretical predictions, p h , under the assumption that the ancestral genome is random with GC content 0.5 (Eq. 5). c Like (b) but with the ancestral sequences given at three random starting positions from a published 1.9 Mbp sequence of the rabbit genome [30]. The red line gives the theoretical predictions (Eq. 4) calculated using the observed frequency distribution of k-mers, Q k , in one of the simulated species. d Theoretical predictions (Eq. 4) of the proportion of shared k-mers, p h , calculated from the observed frequency distribution of k-mers, Q k , for the 11 primate genomes ranging in size from 2.7 to 3.5 Gbp assuming the true distance between taxa is d = 0.02 or 0.1

Back to article page