Skip to main content

Table 3 Analysis of 8 clusters from hierarchical cluster analysis, including the numbers of sites from each call set and a description of the predominant types of sites in each cluster

From: svclassify: a method to establish benchmark structural variant calls

Cluster

4000 Random

Personalis Random

Random LINEs

Random LTRs

Random SINEs

Personalis deletions

1000 Genomes deletions

Total

Proportion that are deletions

Description

1

0

0

0

0

0

371

284

655

1.000

Mostly large, true homozygous deletions

2

0

0

0

0

2

432

237

671

0.997

Heterozygous Alu deletions

3

1

1

1

0

0

705

402

1110

0.997

Homozygous Alu deletions

4

2397

455

38

28

16

9

28

2971

0.012

Large, likely non-SVs. Generally in easy-to-sequence regions

5

1073

1351

352

378

279

1

33

3467

0.010

Smaller, likely non-SVs. Generally in easy-to-sequence regions

6

17

2

1

0

0

3

138

161

0.876

Likely true large homozygous deletions with inaccurate breakpoints so that the true deletion is larger than the called region

7

14

16

2

2

4

624

811

1473

0.974

Mostly true heterozygous deletions in easier-to-sequence regions

8

498

481

103

90

195

161

752

2280

0.400

Mix of non-SVs and SVs in more difficult regions with coverage between the normal coverage and half the normal coverage

Total

4000

2306

497

498

496

2306

2685

12788

0.390

Â