Skip to main content

Table 2 GenBank data sets

From: An overabundance of phase 0 introns immediately after the start codon in eukaryotic genes

Organism group

Vertebrata

Arthropoda

Fungi

Magnoliophyta

Total CDS with introns

54729

34336

31441

95711

pseudogene

1899

90

101

789

not experimental

1204

515

504

9150

incomplete 5' end (<)

15622

10583

11143

11659

incomplete 3' end (>)

5417

1664

569

1561

cross-reference

10445

231

2

60

join (complement)

0

16

0

34

contains 'X'

106

120

71

100

contains 'U'

26

4

0

0

no initial 'M'

222

51

9

34

zero or negative length

36

7

17

35

annotated gap

480

6

0

25

length mismatch

466

19

11

18

Used for length statistics

18807

21030

19014

72247

non-gt...ag

1734

818

1159

3368

intron too short

550

1244

2354

12125

CDS accepted

16523

18968

15501

56754

After homology reduction

3542

4179

4525

12751

With signal peptides

755

769

431

1051

Without signal peptides

2552

3202

3814

10370

  1. The number of genes (CDS features) found in GenBank within the four organism groups studied. The number of genes discarded for various reasons. The number kept after homology reduction. The numbers predicted to contain or not to contain a signal peptide.