Skip to main content

Table 1 Contig assembly of deeply sequenced bacterial genomes

From: BASE: a practical de novo assembler for large genomes using long NGS reads

 

Tools

Parameters

Correct N50

Misatch/Indel

Aligned rate

Coverage

Time(sec)

S.aureus MW2 (240X, 100 bp HiSeq)

SPAdes

51,63,85

299,305

134/6

99.79 %

100.00 %

1239

SOAPdenovo2

87-95

82,495

0/0

99.84 %

99.27 %

25;16

SGA

29;91

74,584

7/0

99.81 %

99.98 %

1228;1149

BASE

4

92,706

0/0

100.00 %

99.97 %

161; 93

V.para (240X, 250 bp MiSeq)

SPAdes

33,55,65,75,85,99

169,978

118/45

99.97 %

99.97 %

4616

SOAPdenovo2

125

88,858

23/30

99.98 %

99.98 %

110;1

SGA

29;149

95,711

58/26

99.80 %

99.97 %

2478;2884

BASE

4

159,715

29/29

100.00 %

99.75 %

676; 388

  1. S.aureus MW2 has its real reference with length 2.8 Mb and V.para has its species’ reference with length 5.1 Mb and two chromosomes. Both of these two bacteria are sequenced up to 240X. GAGE validation pipeline was used to calculate the corrected contig N50, base errors, structural errors, contig aligned rate and reference coverage. Except BASE used single thread for contig assembly part, and other the assemblies were all performed with 24 threads. The time before semicolon is for index building and after semicolon is for assembly. For SGA, indexing time contains the time used in the indexing after error correction and filtering; assembly time contains the time used in the overlap and assembly