Skip to main content

Table 3 Univariate logistic predictors of whether or not a TR is polymorphic in the Whole Genome Shotgun datasets.

From: Marked variation in predicted and observed variability of tandem repeat loci across the human genome

Predictor1

Variable class

Mean among variants

Mean among invariants

% increase among variants

Logistic Pseudo R 2

unit_length

Repeat

3.115

5.828

-46.6

0.19

score

Repeat

84.569

61.891

36.6

0.14

copy_number

Repeat

20.440

11.724

74.3

0.08

%match

Repeat

94.261

89.711

5.1

0.05

AC

Repeat

2.252

1.396

61.3

0.04

%indels

Repeat

2.083

4.619

-54.9

0.04

CA

Repeat

2.280

1.757

29.8

0.03

SNP_allele_freq

Repeat flank

0.166

0.167

-0.6

0.02

AA

Repeat

1.025

1.183

-13.4

0.01

blocklength

Repeat

57.130

48.430

18.0

0.01

G+C_repeat

Repeat

32.098

28.592

12.3

< 0.01

entropy

Repeat

1.050

1.100

-4.5

< 0.01

CC

Repeat

1.052

1.150

-8.5

< 0.01

CpG_flank500

Repeat flank

4.695

5.496

-14.6

< 0.01

G+C_flank500

Repeat flank

41.427

42.363

-2.2

< 0.01

tm_flank500

Repeat flank

53.014

53.418

-0.8

< 0.01

G+C_flank20

Repeat flank

38.653

39.873

-3.1

< 0.01

tm_flank20

Repeat flank

51.606

52.134

-1.0

< 0.01

C

Repeat

15.695

13.831

13.5

< 0.01

G

Repeat

15.559

13.717

13.4

< 0.01

GC

Repeat

1.010

1.088

-7.2

< 0.01

TA

Repeat

0.921

0.877

5.0

< 0.01

A

Repeat

33.180

35.523

-6.6

< 0.01

CpG flank 20

Repeat flank

0.190

0.226

-15.9

< 0.01

tm_repeat

Repeat

48.744

47.967

1.6

< 0.01

num_SNPs

Repeat flank

2.380

2.220

7.2

< 0.01

RNA_free_energy

Repeat

-3.978

-3.157

26.0

< 0.01

nearest_promoter

Distant repeat flank

475996.5

439298.5

8.3

< 0.01

CG

Repeat

0.930

0.973

-4.4

< 0.01

AT

Repeat

0.991

0.963

2.9

< 0.01

AG

Repeat

1.313

1.354

-3.0

< 0.01

T

Repeat

34.591

35.767

-3.3

< 0.01

pop_size

 

6.317

5.642

12.0

< 0.01

nearest_gene

Distant repeat flank

143738.9

133874.5

7.4

< 0.01

nearest_CDS

Distant repeat flank

151516.1

141369.5

7.2

< 0.01

GA

Repeat

1.346

1.382

-2.7

< 0.01

nearest_regulatory

Distant repeat flank

-3642189

-3363138

8.3

< 0.01

nearest_MCS

Distant repeat flank

66672.9

72695.2

-8.3

< 0.01

  1. 1 Significant with p < 0.00005 in both Mann-Whitney and t-test. Only variables with at least one significant p-value at the 5% level are shown, and dinucleotide biases in the flanking sequences of repeats were also excluded.