- Research
- Open Access
- Published:

# Sets of medians in the non-geodesic pseudometric space of unsigned genomes with breakpoints

*BMC Genomics*
**volume 15**, Article number: S3 (2014)

## Abstract

### Background

The breakpoint median in the set *S*_{
n
} of permutations on *n* terms is known to have some unusual behavior, especially if the input genomes are maximally different to each other. The mathematical study of the set of medians is complicated by the facts that breakpoint distance is not a metric but a pseudo-metric, and that it does not define a geodesic space.

### Results

We introduce the notion of partial geodesic, or geodesic patch between two permutations, and show that if two permutations are medians, then every permutation on a geodesic patch between them is also a median. We also prove the conjecture that the input permutations themselves are medians.

## Backgound

Among the common measures of gene order difference between two genomes, the edit distances, such as reversal distance or double-cut-and-join distance, contrast with the breakpoint distance in that the former are defined in a geodesic space while the latter is not. Another characteristic of breakpoint distance that it does not share with most other genomic distances is that it is a pseudometric rather than a metric.

A problem in computational comparative genomics that has been extensively studied under many definitions of genomic distance is the gene order median problem [1], the archetypical instance of the gene order small phylogeny problem. The median genome is meant, in the first instance, to embody the information in common among *k* ≥ 3 given genomes, and second, to estimate the ancestral genome of these *k* genomes. We have shown that the second goal becomes unattainable as *n → ∞*, where *n* is the length of the genomes, if there are more than 0.5*n* mutational steps changing the gene order [2]. Moreover, we have conjectured, and demonstrated in simulation studies, that where there is little or nothing in common among the *k* input genomes, the median tends to reflect only one (actually, any one) of them, with no incorporation of information from the other *k −* 1 [3].

In the present paper, we investigate this conjecture mathematically in the context of a wider study of medians for the breakpoint distance between unsigned linear unichromosomal genomes, although the methods and results are equally valid for genomes with signed and/or circular chromosomes, as well as those with *χ >*1 chromosomes, where *χ* is a fixed parameter. Our approach involves first a rigorous treatment of the pseudometric character of the breakpoint distance. Then, given the non-geodesic nature of the space we are able to define a weaker concept of geodesic patch, that we use later, given two or more medians, to locate further medians. We also prove the conjecture that for *k* genomes containing no gene order information among them, the normalized (divided by *n*) median score tends to *k −* 1, with high probability.

## Results

### From pseudometric to metric

We denote by *S*_{
n
} the set of all permutations of length *n*. Each permutation represents a unichromosomal linear genome where the numbers all represent different genes. For a permutation *π* := *π*_{1} *... π*_{
n
} we define the set of adjacencies of *π* to be all the unordered pairs {*π*_{
i
}*, π*_{i+1}} = {*π*_{
i+1
}*, π*_{
i
}} for *i* = 1*, ..., n −* 1. For *I* ⊆ *S*_{
n
} we denote by ${\mathcal{A}}_{I}:={\mathcal{A}}_{I}^{\left(n\right)}$ the set of all common adjacencies of the elements of *I*. Then ${\mathcal{A}}_{{S}_{n}}=\varnothing $, and we also write ${\mathcal{A}}_{\varnothing}$ for the set of all pairs {*i, j*}*, i* ≠ *j*. For any *I, J* ⊆ *S*_{
n
} ${\mathcal{A}}_{I\phantom{\rule{2.77695pt}{0ex}}\cup \phantom{\rule{2.77695pt}{0ex}}J}={\mathcal{A}}_{I}\cap {\mathcal{A}}_{J}$. It will sometimes be convenient to write ${\mathcal{A}}_{I}$, the set of common adjacencies in *I* = {*x*_{1}*, ..., x*_{
k
} }, as ${\mathcal{A}}_{{x}_{1}},...,{x}_{k}$. For example *A*_{
x,y,z
} represents the set of adjacencies common to permutations *x, y* and *z*.

For *x, y* ∈ *S*_{
n
}we define the breakpoint distance (bp distance) between *x* and *y* by

This distance is not a metric on *S*_{
n
} but rather a pseudometric because of nonreflexiveness: cases where *d*^{(n) }(*x, y*) = 0 but *x* ≠ *y*, namely *x* = *π*_{1} *... π*_{
n
} and *y* = *π*_{
n
} *... π*_{1}, for any *x* ∈ *S*_{
n
}. In these cases, the permutations *x* and *y* are said to be equivalent, denoted by *x* ~ *y*. The equivalence class containing *π* is represented by [*π*] and contains exactly two permutations, *π*_{1}*, ..., π*_{
n
} and *π*_{
n
}*, ..., π*_{1}. The number of classes is thus *n*!*/*2. For any *π*, we denote the other element of [*π*] by $\stackrel{\u0304}{\pi}$. The bp distance, a metric on the set of all equivalence classes of *S*_{
n
}, denoted by ${\u015c}_{n}:={S}_{n}/~$ is defined by

Where there is no risk of ambiguity, we can simplify the notation by using *x* and *y* instead of [*x*] and [*y*], and/or drop the superscript *n*.

It is clear that the maximum possible bp distance between two permutation classes is *n −* 1 when they have no common adjacencies. Bp distance is symmetric on *S*_{
n
} and hence on ${\u015c}_{n}$. By construction, it is reflexive on ${\u015c}_{n}$. To verify the triangle inequality, consider three permutations *x, y, z*. We have

Therefore

But $|{\mathcal{A}}_{x,y}\cup {\mathcal{A}}_{y,z}|\phantom{\rule{2.77695pt}{0ex}}=\phantom{\rule{2.77695pt}{0ex}}|{\mathcal{A}}_{y}\cap \left({\mathcal{A}}_{x}\cup {\mathcal{A}}_{z}\right)|\phantom{\rule{2.77695pt}{0ex}}\le \phantom{\rule{2.77695pt}{0ex}}n-1$ and hence the triangle inequality holds.

We say a pseudometric (or a metric) $\stackrel{\u0303}{\rho}$ is right invariant on a group *G* if for any $x,y,z\phantom{\rule{2.77695pt}{0ex}}\in \phantom{\rule{2.77695pt}{0ex}}G,\phantom{\rule{2.77695pt}{0ex}}\stackrel{\u0303}{\rho}\left(x,y\right)=\stackrel{\u0303}{\rho}\left(xz,\phantom{\rule{2.77695pt}{0ex}}yz\right)$. The definition of the left invariance is similar. A pseudometric (metric) which is both right and left invariant is called invariant. Bp distance is an invariant pseudometric on *S*_{
n
}.

**Definition 1** *Given a set* {*x*_{1}*, . . . , x*_{
k
}} ⊆ *S and a pseudometric space ρ on S, a median for the set is µ* ∈ *S such that* ${\sum}_{i=1}^{k}\rho \left(\mu ,\phantom{\rule{2.77695pt}{0ex}}{x}_{i}\right)$*is minimal*.

### Defining the geodesic patch

A discrete metric space (*S, ρ*) is a geodesic space if for any two points *x, y* ∈ *S* there exists a finite subset of *S* containing *x, y* that is isometric with the discrete line segment [0, 1*, ..., ρ*(*x, y*)]. Any subset of *S* with this property, and there may be several, is called a geodesic between *x* and *y*. For example, all connected graphs are geodesic spaces. In a geodesic space the medians of two points *x* and *y* consist of all the points located on geodesics between *x* and *y*.

What can we say when the space is not a geodesic space? To answer this, we extend the concept of geodesic by introducing the concept of a geodesic patch. A geodesic patch between *x* and *y* is a maximal subset of *S* containing *x, y* which is isometric to a subsegment (not necessarily contiguous) of the line segment [0, 1*, ..., ρ*(*x, y*)]. For any two points *x, y* in an arbitrary metric space (*S, ρ*) there exists at least one geodesic patch between them because *x, y* is isometric to {0*, ρ*(*x, y*)}. In addition, any geodesic is a geodesic patch. Any point *z* on a geodesic patch between *x, y* satisfies:

Therefore all the medians of two points *x* and *y* must lie on a geodesic patch between them. We denote the set of all permutations lying on geodesic patches connecting *x, y* ∈ *S*_{
n
}by $\overline{\left[x,y\right]}$, as in Figure 1.

$\left({\u015c}_{n},\phantom{\rule{2.77695pt}{0ex}}d\right)$ is not a geodesic space. For example there is no geodesic connecting the identity permutation *id* and *π* := 1 2 *x*_{1} *x*_{2} *... x*_{n−4 }*n −* 1 *n* when *x*_{1} *x*_{2} *... x*_{n−4 }is a non-identical permutation on {3*, ..., n −* 2}. The smallest change to *id* is to cut one of its adjacencies, say {*i, i* + 1}, and rejoin the two segments in one of the three possible ways: 1 to *n*, 1 to *i* + 1 or *n* to *i*. Now if we cut the adjacencies {1, 2} or {*n −* 1*, n*} in *id* the distance of the new permutation to both *id* and *π* increases. If on the other hand we cut one of the other adjacencies in *id* all the ways of rejoining, which increase the distance to *id*, either increase or leave unchanged the distance to *π*, since {1*, n*}, {1*, i* + 1} and {*n, i*} are not adjacencies in ${\mathcal{A}}_{\pi}$. Therefore there is no geodesic connecting *id* to *π*.

Although ${\u015c}_{n}$ is not a geodesic space there may still exist permutations with a geodesic between them. For example

is a geodesic between *id* and *π*. Note *d*(*id, π*) = 5, the maximum possible distance in ${\u015c}_{6}$.

### The median value and medians of permutations with maximum pairwise distances

In this section we investigate the bp median problem in the case of *k* permutations with maximum pairwise distances. As we shall see later, this situation is very similar to the case of *k* uniformly random permutations. Let (*S, ρ*) be a pseudometric space.

The total distance of a point *x* ∈ *S* to a finite subset *∅* ≠ *B* ⊆ *S* is defined to be

The median value of *B*, ${m}^{S,\rho}\left(B\right)$, is the infimum of the total distance when the infimum is over all the points *x* ∈ *S*, that is

We can extend this definition to sets with multiplicities. Let *∅* ≠ *B* ⊆ *S*. We define a multiplicity function *n*_{
B
} from *B* to $\mathbb{N}$ and write *n*_{
B
} (*x*) = *n*_{
x
}. We call *A* = (*B, n*_{
B
} ) a set with multiplicities. We define the total distance of a point *x* ∈ *S* to *A* to be

The definition of median value in Equation (8) can be extended in an analogous way to the median value of a set with multiplicity *A*. When *S* is finite then the total distance function takes its minimum on *S* and "inf" turns into "min" in the above formulation. The points of the space *S* that minimize the total distance to *A* are called the median points or medians of *A* and the set of all these medians is called the median set of *A*, denoted by *M* ^{S,ρ}(*A*).

Let *B* and *A* = (*B, n*_{
B
}) be a subset and a subset with multiplicities of *S*_{
n
}. We define [*B*] to be the set of all permutation classes of *S*_{
n
} that have at least one of their permutations in *B*. That is

Two nonempty subsets *B, B′* ⊆ *S*_{
n
} are said to be equivalent, denoted by *B ~ B'*, if [*B*] = [*B′*]. Also we define [*n*_{
B
}] to be a function from [*B*] to $\mathbb{N}$ with

Then the definition of [*A*] is straightforward:

and we say two nonempty subsets of *S*_{
n
}with multiplicities, namely *A* and *A′* are equivalent, denoted by *A ~ A′*, if [*A*] = [*A′*]. In fact [*A*] is the equivalence class containing *A*. We call [*A*] a subset of ${\u015c}_{n}$ with multiplicities. We use the notations "[ ]" and " *~* " for all the above concepts without restriction.

With these definitions we can readily verify that in the context of bp distance, for *A ~ A′* and *x ~ x′*, we have

Recall that we use *d* as both a metric on ${\u015c}_{n}$ and a pseudometric on *S*_{
n
}. Therefore we can conclude that

and similarly

Henceforward, we will simplify by replacing the notation ${m}^{{S}_{n},d}\left(A\right)$ and ${M}^{{S}_{n},d}\left(A\right)$ by *m*_{
n
}(*A*) and *M*_{
n
}(*A*), respectively. Also for a subset [*A*] of ${\u015c}_{n}$ with multiplicities, we will use the notation *m*_{
n
}([*A*]) and *M*_{
n
}([*A*]) instead of ${m}^{{\u015c}_{n},d}\left(\left[A\right]\right)$ and ${M}^{{\u015c}_{n},d}\left(\left[A\right]\right)$ respectively. Where there is no ambiguity we will suppress the subscript *n*.

**Proposition 1** Suppose $X:=\left\{{x}_{1},\dots ,{x}_{k}\right\}\subset {\u015c}_{n}$ such that d(*x*_{
i
}*, x*_{
j
}) = *n −* 1 *for any i ≠ j, i ≤ i, j ≤ n. Then the bp median value of × is* (*k −* 1)(*n −* 1)*. Moreover, m∗ is a median of X, m∗*∈ *M* (*X*)*, if and only if* ${A}_{m*}\subset {\cup}_{i=1}^{k}{A}_{{x}_{i}}$.

*Proof* Let $\pi \in {\u015c}_{n}$ be an arbitrary permutation class. Since ${A}_{\pi ,{x}_{i}}\subset {A}_{{x}_{i}}$ and ${A}_{\pi ,{x}_{j}}\subset {A}_{{x}_{j}}$ for any 1 *≤ i, j ≤ k*, we have ${A}_{\pi ,{x}_{i}}\cap {A}_{\pi ,{x}_{j}}=\mathrm{0\u0338}$. Also

Therefore

Hence

The equality holds letting *π* = *x*_{
i
} for any 1 ≤ *i* ≤ *k*. This proves the first part of the proposition. For the second part we know that *m*^{∗} ∈ *M* (*X*) is equivalent with the fact that the total distance of *m*^{∗} to *X* is (*k −* 1)(*n −* 1), and this is equivalent to ${\sum}_{i=1}^{k}\left|{A}_{{m}^{*},{x}_{i}}\right|=n-1$ and ${\cup}_{i=1}^{k}{A}_{{m}^{*},{x}_{i}}={A}_{{m}^{*}}$ be written as ${A}_{{m}^{*}}\cap \left({\cup}_{i=1}^{k}{A}_{{x}_{i}}\right)$. This finishes the proof of the equivalence relation in the proposition.

**Lemma 1** Let x, y, z be three permutation classes in ${\u015c}_{n}$that are pairwise at a maximum distance n − 1 from each other. Then for any $w\in \overline{\left[x,y\right]}$ we have d (*w, z*) = *n −* 1.

*Proof* Having $w\in \overline{\left[x,y\right]}$ we have *A*_{
w
} ⊂ *A*_{
x
} ∪ *A*_{
y
}. Also we know that ${A}_{z}\cap \left({A}_{x}\cup {A}_{y}\right)=\mathrm{0\u0338}$. This concudes the result.

The above lemma simply indicates that for any two points *x*_{
i
}*, x*_{
j
} in the set *X* in the proposition above $\overline{\left[{x}_{i},{x}_{j}\right]}\subset M\left(X\right)$ since the total distance of each point in $\overline{\left[{x}_{i},{x}_{j}\right]}$ to *X* is (*k −* 1)(*n −* 1).

**Corollary 1** *Suppose* $X:=\left\{{x}_{1,\phantom{\rule{2.77695pt}{0ex}}\dots ,}{x}_{k}\right\}\subset {\u015c}_{n}$*such that d*(*x*_{
i
}*, x*_{
j
}) = *n −* 1 *for any i ≠ j. Then* ${\cup}_{i,j}\overline{\left[{x}_{i},{x}_{j}\right]}\subset M\left(X\right)$.

What more can we say about the median positions? The notion of "accessibility" will help us to keep track of some other medians of the set *X* that are not in ${\cup}_{i,j}\overline{\left[{x}_{i},{x}_{j}\right]}$. Before defining this concept, we first need more information about the properties of $\overline{\left[x,y\right]}$ for $x,y\in {\u015c}_{n}$.

**Lemma 2** *Let* $x,y\in {\u015c}_{n}$. *Then* $z\in \overline{\left[x,y\right]}$*if and only if* ${A}_{x,y}\subset {A}_{z}\subset {A}_{x}\cup {A}_{y}$.

*Proof* We know $z\in \overline{\left[x,y\right]}$ if and only if *d*(*x, z*) + *d*(*z, y*) = *d*(*x, y*). On the other hand we can write *A*_{
z
} as follows

where the pairwise intersection of the sets in the right hand side is empty. We can also write

and

Furthermore

and

Now for "sufficiency", we have

Therefore by Equation (23) we have

This results in *|A*_{
x,y
}*|* = *|A*_{
x,y,z
}*|* and hence in *A*_{
x,y
} ⊂ *A*_{
z
}. Otherwise the inequality in (26) will be strict, which is impossible. On the other hand the inequality in (26) shows ${A}_{z}\backslash \left({A}_{x}\cup {A}_{y}\right)=\mathrm{0\u0338}$ which concludes at ${A}_{z}\subset {A}_{x}\cup {A}_{y}$.

For "necessity", we have

This is true because of *A*_{
z
} ⊂ *A*_{
x
} ∪ *A*_{
y
} and Equation (23). But since *A*_{
x,y
} ⊂ *A*_{
z
} ⊂ *A*_{
x
} ∪ *A*_{
y
} we have *|A*_{
x,y
}*|* = *|A*_{
x,y,z
}*|* and we can replace *|A*_{
x,y
}*|* by *|A*_{
x,y,z
}*|* in the left hand side of the last equality. This finishes the "necessity" proof.

**Definition 2** *Let ×* := {*x*_{1}*, ..., x*_{
k
}} be a subset of ${\u015c}_{n}$. We say a permutation class $z\in {\u015c}_{n}$is 1*-accessible from X if there exists an m* ∈ $\mathcal{N}$, *a finite sequence y*_{1}*, ..., y*_{
m
} *where y*_{
i
} ∈ *X and z*_{1}*, ..., z*_{
m
}*, where* ${z}_{i}\in {\u015c}_{n}$ *such that z*_{1} = *y*_{1}*, z*_{
m
} = z and ${z}_{i+1}\in \overline{\left[{z}_{i},{y}_{i+1}\right]}$ for $i=1...m-1$. See Figure 2.

*We denote the set of all* 1*-accessible points of X by Z*(*X*)*. We define Z*_{0}(*X*) := *X. Also for r* ∈ $\mathcal{N}$ ∪ {0}, *by induction, we define Z*_{r+1}(*X*) *to be Z*(*Z*_{
r
}(*X*)) *and we call it the set of all r+1-accessible permutation classes. That is Z*_{1}(*X*) = *Z*(*X*)*, Z*_{2}(*X*) = *Z*(*Z*(*X*)) *and so on. It is clear that Z*_{r+1}(*X*) *includes Z*_{
r
} (*X*) and also ${\cup}_{x,y\in {Z}_{r}\left(X\right)}\overline{\left[x,y\right]}$. A permutation class z is said to be accessible from × if there exists r ∈ $\mathcal{N}$*such that z* ∈ *Z*_{
r
}(*X*). *We denote the set of all accessible points by* $\overline{Z}\left(X\right)={\cup}_{r\in IN\cup \left\{0\right\}}{Z}_{r}\left(X\right)$.

Note that $Z\left(\overline{Z}\left(X\right)\right)=\overline{Z}\left(X\right)$. This holds because for any 1-accessible permutation class *z* from $\overline{Z}\left(X\right)$, there must exist $m\in \mathcal{N},\phantom{\rule{2.77695pt}{0ex}}{r}_{0}\in \mathcal{N},\cup \left\{0\right\},{y}_{1},...,{y}_{m}\in {\overline{Z}}_{{r}_{0}}\left(X\right)$, (the *y*_{
i
}'s must be in $\overline{Z}\left(X\right)$, thus there must be such an *r*_{0}) and *z*_{1}*, ..., z*_{
m
} where ${z}_{i}\in {\u015c}_{n}$ such that *z*_{1} = *y*_{1}, *z*_{
m
} = *z* and ${z}_{i+1}\in \overline{\left[{z}_{i},{y}_{i+1}\right]}$. Therefore $z\in {Z}_{{r}_{0}+1}\left(X\right)\subset \overline{Z}\left(X\right)$. We can then conclude that $\overline{Z}\left(\overline{Z}\left(X\right)\right)=\overline{Z}\left(X\right)$.

**Proposition 2** Suppose $X:=\left\{{x}_{1},...,{x}_{k}\right\}\subset {\u015c}_{n}$ such that d (*x*_{
i
}*, x*_{
j
}) = *n−*1 *for any i* ≠ j. Then for any permutation class $z\in \overline{Z}\left(X\right)$ the total distance d (*z, X*) *between z and × is* (*k −*1)(*n−*1) and hence $\overline{Z}\left(X\right)\subset M\left(X\right)$ Furthermore if m_{1}*, m*_{2} ∈ *M* (*X*) *then* $\overline{\left[{m}_{\mathsf{\text{1}}},{m}_{\mathsf{\text{2}}}\right]}\subset M\left(X\right)$.

*Proof* Suppose *m*_{1}*, m*_{2} ∈ *M* (*X*) and ${m}^{*}\in \overline{\left[{m}_{1},{m}_{2}\right]}$. By Lemma 2 and Proposition 1 we have ${A}_{{m}^{*}}\subset {A}_{{m}_{1}}\cup {A}_{{m}_{2}}\subset {\cup}_{i=1}^{k}{A}_{{x}_{i}}$. Applying Proposition 1 again, we have *m*^{∗}∈ *M* (*X*). Now it suffices to show that for any *r* ∈ *IN ∪* {0}, *Z*_{
r
} (*X*) ⊂ *M* (*X*). We prove this by induction. For *r* = 0 this follows from Corollary 1. Suppose *Z*_{
r
} (*X*) ⊂ *M* (*X*). By definition we have *Z*_{r+1}(*X*) = *Z*(*Z*_{
r
}(*X*)). That is for *z* ∈ *Z*_{r+1}(*X*) there exists an *m* ∈ $\mathcal{N}$, *y*_{1}*, ..., y*_{
m
} ∈ *Z*_{
r
} (*X*) and *z*_{1}*, ..., z*_{
m
}, where ${z}_{i}\in {\u015c}_{n}$, such that *z*_{1} = *y*_{1}, *z*_{
m
} = *z* and ${{z}_{i}}_{+1}\in \overline{\left[{z}_{i},{y}_{i+1}\right]}.\phantom{\rule{2.77695pt}{0ex}}{z}_{1}\in \overline{\left[{y}_{1},{y}_{2}\right]}$ and by the fact we proved above *z*_{1} ∈ *M* (*X*) since *y*_{1}*, ..., y*_{
m
} ∈ *Z*_{
r
} (*X*) ⊂ *M* (*X*). Continuing this we conclude that *z*_{1}*, z*_{2}*, ..., z*_{
m
} = *z* ∈ *M* (*X*). Hence *Z*_{r+1}(*X*) ⊂ *M* (*X*). This finishes the proof.

**Conjecture 1** *Every median point of X is accessible from X, that is* $M\left(X\right)=\overline{Z}\left(X\right)$.

The median value and medians of *k* random permutations

In this section we study the median value and median points of *k* independent random permutation classes uniformly chosen from ${\u015c}_{n}$. This is equivalent to studying the same problem for *k* random permutations sampled from *S*_{
n
}. All the results of this section carry over to permutations without any problem.

We make use of the fact that the bp distance of two independent random permutations tends to be close to its maximum value, *n −* 1. Xu et al. [4] showed that if we fix a reference linear permutation *id* and pick a random permutation *x* uniformly, the expected number and variance of $\left|{\mathcal{A}}_{id,x}^{\left(n\right)}\right|$ both are very close to 2 for large enough *n*. Because of the symmetry of the group *S*_{
n
} and the fact that bp distance is an invariant pseudometric the same results hold for two random permutations *x* and *y*. We first summarize the results we need from [4].

Let ${\stackrel{\u0303}{\nu}}_{n}$ be the uniform measure on S_{n}. Let $\Pi :{S}_{n}\to {\u015c}_{n}$ be the natural surjective map sending each permutation onto its corresponding permutation class.

Define

to be the push-forward measure of ${\stackrel{\u0303}{\nu}}_{n}$ induced by the map Π. It is clear that ${\nu}_{n}$ is the uniform measure on ${\u015c}_{n}$. The following proposition is a reformulation of Theorems 6 and 7 in [4].

**Proposition 3** *[Xu-Alain-Sankoff ] Let × and y be two independent random permutation classes (irpc) chosen uniformly from* ${\u015c}_{n}$. *Then*

Define the error function for the distance of *x, y* by

**Corollary 2** *Suppose × and y are two irpc's sampled from the uniform measure* ${\nu}_{n}$*and* ${a}_{n}$*is an arbitrary sequence of real numbers diverging to* +*∞. Then* $\frac{{\epsilon}_{n}\left(x,y\right)}{{a}_{n}}$*converges to zero asymptotically* ${\nu}_{n}^{*2}$*-almost surely (a.a.s.), that is*

*Proof* The proof is straightforward from [4] and Chebyshev's inequality.

Now we are ready to study the median value of *k irpc*'s. Let [*A*] be a subset of ${\u015c}_{n}$ with multiplicities and with *k* elements. Define

**Theorem 1** *Let* ${X}^{\left(n\right)}:=\left\{{x}_{1}^{\left(n\right)},\phantom{\rule{2.77695pt}{0ex}}{x}_{2}^{\left(n\right)},\phantom{\rule{2.77695pt}{0ex}}\dots .,\phantom{\rule{2.77695pt}{0ex}}{x}_{k}^{\left(n\right)}\right\}$*be a set of k irpc in* ${\u015c}_{n}$*sampled from the measure* ${\nu}_{n}^{*k}$. *Then their breakpoint median value* ${m}_{n}^{*};={m}_{n}\left({X}^{\left(n\right)}\right)$ tends to be close to its maximum after a convenient rescaling with high probability, that is for any arbitrary sequence ${a}_{n}$→ ∞ as $n\to \infty ,\phantom{\rule{2.77695pt}{0ex}}\infty \frac{{e}_{n}^{*}}{{a}_{n}}\to 0$*in* ${\nu}_{n}^{*k}$*-probability where* ${e}_{n}^{*}:={e}_{n}\left({X}^{\left(n\right)}\right)$

*Proof* Let *π* be an arbitrary point of *S*_{
n
}. Let ${\mathcal{A}}_{\pi \backslash X}={\mathcal{A}}_{\pi}\backslash {\mathcal{A}}_{X}$. We have

where ${\alpha}_{n}$ is max_{
i,j
} *ε*_{
n
}(*x*_{
i
}*, x*_{
j
}). On the other hand *m*_{
n
}(*X*^{(n)}) *≤* (*k −* 1)(*n −* 1). The reason is the same as has already been discussed in the proof of Proposition 1. Therefore subtracting (*k −* 1)(*n −* 1) we have

Dividing by ${a}_{n}$ and letting *n* go to *∞* the result follows from the last corollary.

**Theorem 2** *Let* ${X}^{\left(n\right)}:=\left\{{x}_{1}^{\left(n\right)},{x}_{2}^{\left(n\right)},\dots ,{x}_{k}^{\left(n\right)}\right\}$*be a set of k irpc's in* ${\u015c}_{n}$*sampled from the measure* ${v}_{n}^{*k}$. *Then for any permutation class* ${z}^{\left(n\right)}\in \overline{Z}\left({X}^{\left(n\right)}\right)$*the total distance of z*^{(n) }*to × is close to* (*k −*1)(*n−*1) with high probability after a convenient rescaling. More explicitly, for any arbitrary sequence of real numbers ${a}_{n}$ converging to ∞

*Therefore*

*Furthermore if*
${m}_{1}^{\left(n\right)},\phantom{\rule{2.77695pt}{0ex}}{m}_{2}^{\left(n\right)}\in {M}_{n}\left({X}^{\left(n\right)}\right)$
*then for any*
${\stackrel{~}{m}}^{\left(n\right)}\in \overline{\left[{m}_{1}^{\left(n\right)},\phantom{\rule{2.77695pt}{0ex}}{m}_{2}^{\left(n\right)}\right]}$

*Proof* The structure of the proof is similar to the proof of Proposition 1. Suppose $o\in {\u015c}_{n}$ with ${\mathcal{A}}_{o}{\subset}_{i=1}^{k}\cup {\mathcal{A}}_{{x}_{i}}$. Let ${\alpha}_{n}$ be as defined in the proof of Theorem 1. Then by the same discussion we have

Therefore

and

From Theorem 1 we have

Hence

It suffices to show that $z\phantom{\rule{2.77695pt}{0ex}}:={Z}^{\left(n\right)}\in \overline{Z}\left(X\right)$ has the same property, that is ${\mathcal{A}}_{z}\in {\cup}_{i=1}^{k}{\mathcal{A}}_{{x}_{i}}$. But this is clear by induction. For the second part of the theorem let ${m}_{1,n}^{*},\phantom{\rule{2.77695pt}{0ex}}{m}_{2,n}^{*}\in M\left(X\right)$. Suppose ${m}^{*}\in \left[{m}_{1,n}^{*},\phantom{\rule{2.77695pt}{0ex}}{m}_{2,n}^{*}\right]$. By Theorem 1 $\frac{\left|{A}_{{m}_{in}^{*}\backslash X}\right|}{{a}_{n}}\to 0$ in probability for *i* = 1, 2. On the other hand we have ${\mathcal{A}}_{{m}^{*}\backslash X}\subset {\mathcal{A}}_{{m}_{1,n}^{*}\backslash X}\cup {\mathcal{A}}_{{m}_{2,n}^{*}\backslash X}$.

Therefore

Therefore

since

The statement follows from the last inequality.

## Conclusions

We have shown that the median value for a set of random permutations tends to be close to its extreme value with high probability. Also it has been shown that every permutation accessible from a set of random permutations can be considered as a median of that set asymptotically almost surely, and conjectured that the converse is true, that every median is accessible from the original set in this way.

Further work is needed to characterize the existence and size of non-trivial geodesic patches, in order to assess how extensive the set of medians is.

## References

- 1.
Tannier E, Zheng C, Sankoff D: Multichromosomal median and halving problems under different genomic distances. BMC Bioinformatics. 2009, 10: 120-10.1186/1471-2105-10-120.

- 2.
Jamshidpey A, Sankoff D: Phase change for the accuracy of the median value in estimating divergence time. BMC Bioinformatics. 2013, 14: S15:S7-10.1186/1471-2105-14-157.

- 3.
Haghighi M, Sankoff D: Medians seek the corners, and other conjectures. BMC Bioinformatics. 2012, 13: S19:S5-10.1186/1471-2105-13-195.

- 4.
Xu AW, Alain B, Sankoff D: Poisson adjacency distributions in genome comparison: multichromosomal, circular, signed and unsigned cases. Bioinformatics. 2008, 24: i146-i152. 10.1093/bioinformatics/btn295.

## Acknowledgements

Research supported in part by grants from the Natural Sciences and Engineering Research Council of Canada (NSERC). DS holds the Canada Research Chair in Mathematical Genomics.

**Declarations**

The publication charges for this article were funded by the Canada Research Chair in Mathematical Genomics, and by the University of Ottawa.

This article has been published as part of *BMC Genomics* Volume 15 Supplement 6, 2014: Proceedings of the Twelfth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics. The full contents of the supplement are available online at http://0-www.biomedcentral.com.brum.beds.ac.uk/bmcgenomics/supplements/15/S6.

## Author information

### Affiliations

### Corresponding author

## Additional information

### Competing interests

The authors declare that they have no competing interests.

### Authors' contributions

All authors participated in the research, wrote the paper, read and approved the manuscript.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

## About this article

### Cite this article

Jamshidpey, A., Jamshidpey, A. & Sankoff, D. Sets of medians in the non-geodesic pseudometric space of unsigned genomes with breakpoints.
*BMC Genomics* **15, **S3 (2014). https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2164-15-S6-S3

Published:

DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2164-15-S6-S3

### Keywords

- breakpoint distance
- pseudometric
- non-geodesic space
- random genomes