Skip to main content

Table 1 Illustration of the mining process of the modified PrefixSpan algorithm.

From: Similarity evaluation of DNA sequences based on frequent patterns and entropy

current pattern

extended patterns

C〉: 7, 10, 14, 16, 17;

CA〉: 8; 〈CC〉: 17; 〈CG〉: Empty; 〈CT 〉: 11, 15, 18;

CT 〉: 11, 15, 18;

CT A〉: Empty; 〈CT C〉: 16; 〈CT G〉: 12, 19; 〈CT T 〉: Empty;

CT G〉: 12, 19;

CTGA〉: 13, 20; 〈CT GC〉: Empty; 〈CT GG〉: Empty; 〈CT GT 〉: Empty;

CTGA〉: 13, 20;

CT GAA〉: Empty; 〈CT GAC〉: 14; 〈CT GAG〉: Empty; 〈CT GAT 〉: Empty;

A〉: 1, 8, 13, 20;

AA〉: Empty; 〈AC〉: 14; 〈AG〉: Empty; 〈AT〉: 2, 9;

AT〉: 2, 9;

AT A〉: Empty; 〈AT C〉: 10; 〈AT G〉: 3; 〈AT T 〉: Empty;

G〉: 3, 4, 6, 12, 19;

GA〉: 13, 20; 〈GC〉: 7; 〈GG〉: 4; 〈GT 〉: 5;

T 〉: 2, 5, 9, 11, 15, 18;

T A〉: Empty; 〈TC〉: 10, 16; 〈T G〉: 3, 6, 12, 19; 〈T T 〉: Empty;

TC〉: 10, 16;

T CA〉: Empty; 〈T CC〉: 17; 〈T CG〉: Empty; 〈T CT 〉: 11;

T G〉: 3, 6, 12, 19;

T GA〉: 13, 20; 〈T GC〉: 7; 〈T GG〉: 4; 〈T GT 〉: Empty;

  1. Each row represents one recursive step. The numbers after each pattern represent the starting locations of the suffixes, which are the so-called pseudo-projections. Patterns in bold are maximal.