From: Marked variation in predicted and observed variability of tandem repeat loci across the human genome
Predictor name | Description |
---|---|
Statistics About The Tandem Repeat | |
pop_size | Population size: number of unique sequences from which estimate of repeat variability was obtained |
entropy | Based on percentage composition: ∑ ω * log(ω) where ω is A, C, G and T. This is 0 when the repeat array consists of only one nucleotide and 2 when all nucleotides are equal [16] |
T | %T in the repeat, e.g. 50% in TGTGTGTG |
G | %G in the repeat, e.g. 50% in TGTGTGTG |
C | %C in the repeat, e.g. 50% in CACACACA |
A | %A in the repeat, e.g. 50% in CACACACA |
score | Tandem Repeat Finder (TRF) [16] program-derived overall score |
%indels | inferred consensus [16] |
%match | % matches between actual repeat units and the inferred consensus [16] |
unit_length | Length of tandem repeat unit, e.g. 2 for TGTGTGTG |
blocklength | Length of the tandem repeat array, e.g. 8 for TGTGTGTG |
copy_number | Number of copies of repeat unit, e.g. 4 for TGTGTGTG |
CG, CA, AC, AG, GA, CC, AT, TA, GC, AA | Observed/expected dinucleotide bias of 10 dimers in the tandem repeat array [24] |
tm_repeat | Melting temperature of the sequence [26] |
G+C_repeat | Fraction of the sequence represented by the bases G or C, e.g. 0.5 for TGTGTGTG |
RNA_free_energy | Free energy of the tandem repeat sequence RNA secondary structure [37] |
Statistics From 20 bp/500 bp Flanks of the Repeat | |
tm_flank20, tm_flank500 | Melting temperature of the sequence [26] |
G+C_flank20, G+C_flank500 | Fractional G+C content of the two 20 bp/500 bp flanking sequences |
CpG_flank20, CpG_flank500 | Number of CpG (CG) dinucleotides in the two 20 bp/500 bp flanking sequences |
num_SNPs | Total number of SNPs in the two 500 bp flanks |
SNP_allele_freq | Mean SNP minor allele frequency for SNPs in the two 500 bp flanks |
Statistics on distance to nearest neighbouring elements | |
nearest_promoter | Distance in nucleotides from the nearest promoter |
nearest_gene/CDS | Distance in nucleotides from the nearest gene/CDS |
nearest_MCS | Distance in nucleotides from the nearest Multi-species Conserved Sequences defined in [33] |
nearest_CpG | Distance in nucleotides from the nearest CpG island (defined by the UCSC genome browser) |
nearest_regulatory | Distance in nucleotides from the nearest regulatory region (UCSC genome browser) |