Skip to main content

Table 1 Effect of tag length on MPSS library complexity

From: Deep analysis of cellular transcriptomes – LongSAGE versus classic MPSS

Tag length sequenced (bp)

Length of tags analysed

Number of unique tags

Tags matching genome sequence

20

20

14,894

11,489 (77%)

20

17

13,576

11,934 (88%)

20

14

12,509

12,372 (99%)

17

17

18,084

14,307 (79%)

17

14

15,190

14,944 (98%)

14

14

19,931

19,402 (97%)

  1. MPSS tags can be extracted from the same initial dataset to produce tags of different lengths; in this case 14, 17 and 20 bp tags were extracted. After the extractions, tag lengths can be computationally shortened to see if there is a difference in complexity between the different tag extractions. Decreasing the tag length sequenced was, unexpectedly, found to increase the complexity of the library. For example, 14,894 different 20 base tags were produced, which contained 13,576 different 17 base sequences if the last 4 bases were ignored. However, if the tags were initially extracted at 17 bases (i.e. ignoring the last annealing step in sequencing) then a library of 18,084 different tag sequences was produced; 4,508 distinct species are therefore lost in this last sequencing step. The last column shows how many of the distinct tag species have perfect matches in the human genome, and this is also expressed as the proportion of the species identified (in brackets).