Skip to main content

Table 3 Comparison of gene family content

From: Comparative genomic analysis of human infective Trypanosoma cruzi lineages with the bat-restricted subspecies T. cruzi marinkellei

 

T. c. marinkellei

T. c. cruzi Sylvio X10

 

Gene familya

Size in assemblyb

% Short readsc

Size in assemblyb

% Short readsc

SEd

DGF

2,129,983 (6.22 %)

3.433

1,265,650 (3.28 %)

1.324

Tcm

TS

2,109,163 (6.16 %)

6.291

2,953,602 (7.65 %)

6.298

Tcc X10

MASP

540,360 (1.58 %)

1.317

727,537 (1.88 %)

1.434

Tcc X10

RHS

521,665 (1.52 %)

2.234

1,314,589 (3.41 %)

2.915

Tcc X10

GP63

452,732 (1.32 %)

1.229

514,422 (1.33 %)

0.898

Tcm

TcMUC mucin

273,890 (0.80 %)

0.557

334,544 (0.87 %)

0.515

Tcc X10

ABC

37,490 (0.11 %)

0.124

42,072 (0.11 %)

0.162

Tcc X10

RBP

25,946 (0.08 %)

0.080

26,732 (0.07 %)

0.074

Tcc X10

  1. a Gene family abbreviations: DGF=Dispersed Gene Family, TS=trans-sialidase, MASP=Mucin-associated surface protein, GP63=Surface protease, RHS=Retrotransposon Hot Spot protein, ABC=ABC Transporter, RBP=RNA Binding Protein.
  2. b The combined number of base pairs of this gene family that was identified in the assembly. Sequences were identified using RepeatMasker and a repeat library of coding sequences from the Tcc CLBR genome. These numbers include partial coding sequences. The number inside parenthesis refers to the percentage of total assembly size.
  3. c The percentage of short reads that mapped to these features.
  4. d SE=Significantly Enriched. Refers to if one genome contained significantly more of this gene family. The significance was determined from an empirical distribution of read depth differences from homologous regions of Tcm and Tcc X10, corrected for genome size. The empirical distribution was used to calculate a p-value.