Skip to main content

Table 1 Selected nucleotide pattern frequencies for the human data

From: A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts

 

GRCh37

GRCh38

1

aa, aaa, ac, aca, acg

aa, aaa, ac, aca, act

2

act, ag, aga, at, ata

ag, aga, at, ata, atc

3

atc, atg, att, ca, caa

atg, att, ca, caa, cac

4

cac, cag, cat, cc, cca

cag, cat, cc, cca, ccc

5

ccc, cg, cgc, ct, cta

cg, cgc, ct, cta, ctc

6

ctc, ctg, ga, gac, gag

ctg, ga, gac, gag, gc

7

gc, gcg, gg, ggg, gt

gcg, gg, ggg, gt, gta

8

gtc, gtg, ta, tac, tag

gtc, gtg, ta, tac, tag

9

tat, tc, tca, tct, tg

tat, tc, tca, tct, tg

10

tga, tgt, tt, ttg, ttt

tga, tgt, tt, ttg, ttt

  1. GRCh37 and GRCh38 data sets were analyzed to identify 50 pattern frequencies with the highest PCA loadings. The patterns “acg” and “gta”, in bold, are the only difference. In the additional files, we listed these nucleotide pattern frequencies, ordered by PCA loadings