Skip to main content

Table 1 Summary statistics for the resultant SNP loci datasets of three pipelines, filtered at a 70% call rate (see Additional file 1: Table S1 for data filtered on 30% call rate), for Tasmanian devil (N = 131) and pink-footed goose (N = 40), including total number of loci (total loci), average number of loci sequenced across individuals (mean loci), amount of missing data (%), calculated error rates (%), mean observed heterozygosity across loci (HO), mean expected heterozygosity across loci (HE), and average multilocus heterozygosity of individuals (MLH)

From: From reference genomes to population genomics: comparing three reference-aligned reduced-representation sequencing pipelines in two wildlife species

Dataset

Pipeline

CPU hoursa

Total loci

Mean loci (min; max)

% missing

Error rate (%)b

HO (± SD)

HE (± SD)

MLH (± SD)

Devil

Stacks

16

1359

1177.3 (500; 1326)

13.4

2.9

0.207 (0.149)

0.248 (0.163)

0.205 (0.043)

SAMtools

55

251

205.8 (96; 236)

18.0

6.6

0.308 (0.160)

0.327 (0.115)

0.298 (0.092)

GATK

325

1464

1297.2 (604; 1442)

11.4

5.3

0.185 (0.139)

0.256 (0.161)

0.184 (0.040)

Goose

Stacks

11

52,053

44,914.4 (954; 50,517)

13.7

NA

0.132 (0.127)

0.156 (0.136)

0.127 (0.026)

SAMtools

14

26,437

22,035.0 (732; 23,732)

16.7

NA

0.256 (0.160)

0.307 (0.142)

0.563 (0.158)

GATK

65

277,362

245,412.2 (6787; 270,0084)

11.5

NA

0.137 (0.121)

0.187 (0.149)

0.132 (0.034)

  1. aCPU hours represent total computational time for each pipeline excluding alignment and the further filtering in R. Note that while some steps can be parallelised for quicker computation, not all steps allow for this
  2. bError rates could not be calculated for the pink-footed goose dataset as no replicates were included in the current analysis. Error rate is calculated after filtering on SNPs with > 85% reproducibility, so is lower than initial error rates