Skip to main content

Table 3 Summary of human contig assembly

From: BASE: a practical de novo assembler for large genomes using long NGS reads

 

YH, 100 bp

YH, 150 bp

NA12878, 150 bp

NA12878, 250 bp

SOAPdenovo2,k = 41

BASE

SOAPdenovo2,k = 61

BASE

SOAPdenovo2,k = 41

BASE

SOAPdenovo2,k = 61

BASE

Contig num

3,420,897

3,319,617

2,279,026

2,145,792

8,068,278

1,934,261

1,416,658

1,511,270

Contig size

2.67E + 09

2.88E + 09

2.76E + 09

2.95E + 09

2.44E + 09

2.90E + 09

2.60E + 09

2.94E + 09

Contig N50

2,244

2,279

3,008

3,126

1,140

3,823

3,368

4,199

Contig aligned rate

99.10 %

97.07 %

98.87 %

95.96 %

99.40 %

97.62 %

99.34 %

96.33 %

Genome coverage

90.36 %

93.76 %

93.12 %

93.90 %

84.11 %

95.58 %

89.55 %

94.09 %

RepeatMasked coverage

97.05 %

96.13 %

97.28 %

95.32 %

93.94 %

97.38 %

95.60 %

95.99 %

Exon coverage

93.76 %

91.51 %

95.73 %

94.13 %

91.48 %

96.84 %

93.90 %

91.49 %

Mismatch base

2,735,141

3,479,046

2,911,990

3,839,110

2,301,111

3,459,648

2,544,785

3,751,887

Mismatch ratio

0.103 %

0.121 %

0.105 %

0.130 %

0.094 %

0.119 %

0.098 %

0.128 %

Indel num

340,930

327,469

358,358

334,989

259,190

322,214

327,695

372,941

Indel base

1,412,005

1,587,265

1,692,213

1,741,947

1,086,014

1,602,240

1,400,230

1,953,311

Indel ratio

0.053 %

0.057 %

0.062 %

0.061 %

0.045 %

0.057 %

0.054 %

0.069 %

  1. We mapped the raw contigs to Hg19. Aligned rate is the contig-aligned length divided by total contig length. To calculate genome coverage, the length of gap regions in Hg19 has been removed. For unique coverage, the repetitive regions have been further removed. For SOAPdenovo2 contig assembly, we all used single-kmer method and M1 to treat heterozygous regions