Perfect Hamming code with a hash table for faster genome mapping

BMC Genomics

Table 2 Summary of our methods for lengths 5, 21, and 10 to refer to 1- and 2-mismatch and 1- and 2-gap sequences

length	condition	#keys	#words	ratio	f(s, K) when s = c(s)	f(s, K) when c ≠ c(s)
5	1-mismatch	6.625	16	41.4%	1 + 15x	1 + 15x + 42x² + 54x³
	2-mismatches	27.25	106	25.7%	1 + 15 + 90x² + 210x³ + 180x⁴	1 + 15 + 90x² + 170x³ + 156x⁴
	1-gap	3.25	4	81.3%	4 + 12x	4 + 60x
	2-gaps	10	16	62.5%	16 + 36x + 108x²	– ∗¹
21	1-mismatch	30.53	64	47.7%	1 + 63x	1 + 63x + 210x² + 1710x³
	2-mismatches	611.31	1954	31.3%	1 + 63x + 1890x² + 4410x³ + 34020x⁴	1 + 63x + 1890x² + 5650x³ + 31500x⁴
	1-gap	3.81	4	95.3%	4 + 60x	4 + 252x
	2-gaps	13.87	16	86.7%	16 + 84x + 540x²	16 + 48x + 960x²
10: Serialize	1-mismatch	12.25	31	39.5%	1 + 30x + 225x²	1 + 30x + 170x² + 538x³ + 1089x⁴ + 1620x⁵ ∗²
10: Parallelize	1-mismatch	13.25	31	44.1%	1 + 30x	1 + 30x + 84x² + 108x³ ∗³

∗¹ :s always includes one code word. ∗²: neither the first half nor the second half are code words. The reference formula when one of the two halves is a code word is 1 + 30x² + 267x² + 684x³ + 810x⁴. ∗³: neither the first half or second half are code words. The reference formula when one of the two halves is a code word is 1 + 30x² + 42x² + 54x³.

ISSN: 1471-2164