Dictionaries
Contents
12.6. DictionariesΒΆ
In this section we review various properties associated with a dictionary
We recall that a dictionary
with
The vectors
Note that we are using the same symbol
This shouldnβt be causing any confusion.
When we write the subscript as
Often, we will be working with a subset of atoms in a dictionary.
Usually such a subset
of atoms will be indexed by an index set
Often we will need the notion of a subdictionary [37] described below.
12.6.1. SubdictionariesΒΆ
Definition 12.15 (Subdictionary)
A subdictionary is a linearly independent collection of atoms.
Let
Remark 12.5 (Rank of subdictionary)
A subdictionary is full rank.
This is obvious since it is a collection of linearly independent atoms.
For subdictionaries, often we will say
Let
and
Each dual vector
Therefore, the inverse Gram matrix lists the inner products between the dual vectors.
Sometimes we will be discussing tools
which apply for general matrices. We will use
the symbol
12.6.2. SparkΒΆ
Definition 12.16 (Spark)
The spark of a given matrix
Note that the definition of spark applies to all matrices (wide, tall or square). It is not restricted to the synthesis matrices for a dictionary.
Correspondingly, the spark of a dictionary is defined as the minimum number of atoms which are linearly dependent.
We recall that rank of a matrix is defined as the maximum number of columns which
are linearly independent.
Definition of spark bears remarkable resemblance yet its very hard
to obtain as it requires a combinatorial search over all possible
subsets of columns of
Example 12.21 (Spark)
Spark of the
identity matrixis 4 since all columns are linearly independent.
Spark of the
matrixis 2 since column 1 and 3 are linearly dependent.
If a matrix has a column with all zero entries, then the spark of such a matrix is 1. This is a trivial case and we will not consider such matrices in the sequel.
In general for an
synthesis matrix, .
A naive combinatorial algorithm to calculate the spark of a matrix is given below.
Algorithm 12.1 (A naive algorithm for computing the spark of a matrix)
Inputs:
: a matrix
Outputs:
: Spark of
Algorithm:
Let
.For every
:Identify
ways of choosing columns from columns of .For every choice of
columns:If columns are linearly dependent:
.Return.
All columns are linearly independent.
.
12.6.2.1. Spark and NullspaceΒΆ
Remark 12.6 (Spark and sparsity of null space vectors)
The
Proof. We proceed as follows:
If
then .Thus non-zero entries in
pick a set of columns in which are linearly dependent.Clearly
indicates the number of columns in the set which are linearly dependent.By definition, spark of
indicates the minimum number of columns which are linearly dependent.Hence the result:
12.6.2.2. Uniqueness-SparkΒΆ
Spark is useful in characterizing the uniqueness of the solution
of a
Theorem 12.22 (Uniqueness of a sparse solution for an underdetermined system via spark)
Consider a solution
then it is necessarily the sparsest solution.
Proof. Let
Due to Remark 12.6 we have
Now
Hence, if
for any other solution
This result is quite useful as it establishes a global optimality criterion for the
As long as
Note that we are only saying that if a sufficiently sparse solution is found then it is unique. We are not claiming that it is possible to find such a solution.
Obviously, the larger the spark, we can guarantee uniqueness for signals with higher sparsity levels. So a natural question is: How large can spark of a dictionary be? We consider few examples.
Example 12.22 (Spark of Gaussian dictionaries)
Consider a dictionary
Since a dictionary requires all its atoms to be unit-norms, hence we divide the each of the random vectors with their norms.
We know that with probability
any set of independent Gaussian random vectors is linearly independent.Also, since
hence a set of atoms is always linearly dependent.Thus
.
Thus, if a solution to EXACT-SPARSE problem contains
Example 12.23 (Spark of Dirac Fourier basis)
For
it can be shown that
In this case, the sparsity level of a unique solution must be less than
12.6.3. CoherenceΒΆ
Finding out the spark of a dictionary
Definition 12.17 (Coherence of a dictionary)
The coherence of a dictionary
If the dictionary consists of two orthonormal bases, then coherence is also known as mutual coherence or proximity; see Definition 12.1.
We note that
We note that by definition
For an orthonormal basis
Thus
In the following, we will use the notation
The off-diagonal entries of the Gram matrix are captured by the
matrix
The inner product between any two atoms
If a dictionary is uniform in the sense that there is not much variation in
Definition 12.18 (Incoherent dictionary)
We say that a dictionary is incoherent if the coherence of the dictionary is small.
We are looking for dictionaries which are incoherent. In the sequel we will see how incoherence plays a role in sparse approximation.
Example 12.24 (Coherence of two ortho bases)
We established in Theorem 12.3 that coherence of two ortho-bases is bounded by
In particular we showed in Theorem 12.4 that
coherence of Dirac Fourier basis is
Example 12.25 (Coherence: Multi-ONB dictionary)
A dictionary of concatenated orthonormal bases is called a multi-ONB.
For some
Theorem 12.23 (Coherence lower bound)
A lower bound on the coherence of a general dictionary is given by
Definition 12.19 (Grassmannian frame)
If each atomic inner product meets this bound, the dictionary is called an optimal Grassmannian frame.
The definition of coherence can be extended to arbitrary matrices
Definition 12.20 (Coherence for arbitrary matrices)
The coherence of a matrix
Then coherence of
It is assumed that none of the columns in
12.6.3.1. Lower Bounds for SparkΒΆ
Coherence of a matrix is easy to compute. More interestingly it also provides a lower bound on the spark of a matrix.
Theorem 12.24 (Lower bound on spark in terms of coherence)
For any matrix
Proof. We note that scaling of a column of
We now construct the Gram matrix of
given by .We note that
since each column of
is unit norm.Also
Consider any
columns from and construct its Gram matrix.This is nothing but a leading minor of size
from the matrix .From the Gershgorin disk theorem, if this minor is diagonally dominant, i.e. if
then this sub-matrix of
is positive definite and so corresponding columns from are linearly independent.But
and
for the minor under consideration.
Hence for
columns to be linearly independent the following condition is sufficientThus if
then every set of
columns from is linearly independent.Hence, the smallest possible set of linearly dependent columns must satisfy
This establishes the lower bound that
This bound on spark doesnβt make any assumptions on the structure of the dictionary. In fact, imposing additional structure on the dictionary can give better bounds. Let us look at an example for a two ortho-basis [18].
Theorem 12.25 (Lower bound on spark for two ortho bases)
Let
For maximally incoherent two orthonormal bases, we know that
12.6.3.2. Uniqueness-CoherenceΒΆ
We can now establish a uniqueness condition for sparse solution of
Theorem 12.26 (Uniqueness of a sparse solution of an underdetermined system via coherence)
Consider a solution
then it is necessarily the sparsest solution.
Proof. This is a straightforward application of Theorem 12.22 and Theorem 12.24.
It is interesting to compare the two uniqueness theorems: Theorem 12.22 and Theorem 12.26.
Theorem 12.22 uses spark, is sharp and is far more powerful than Theorem 12.26.
Coherence can never be smaller than
However, spark can be easily as large as
We recall from Theorem 12.8 that the bound for
sparsity level of sparest solution in two-ortho basis
which is a larger bound than Theorem 12.26 for general dictionaries by a factor of 2.
Thus, we note that coherence gives a weaker bound than spark for supportable sparsity levels of unique solutions. The advantage that coherence has is that it is easily computable and doesnβt require any special structure on the dictionary (two ortho basis has a special structure).
12.6.3.3. Singular Values of SubdictionariesΒΆ
Theorem 12.27 (Singular values of subdictionaries and coherence)
Let
Moreover, the singular values of the sub-dictionary
Proof. We recall from Gershgorinβs circle theorem that for
any square matrix
Now consider the matrix
with diagonal elements equal to 1 and off diagonal elements bounded by the coherence .Then
Thus,
This gives us a lower bound on the smallest eigen value.
Since
is positive definite ( is full-rank), hence its eigen values are positive. Thus, the above lower bound is useful only ifWe also get an upper bound on the eigen values of
given byThe bounds on singular values of
are obtained as a straight-forward extension by taking square roots on the expressions.
12.6.3.4. Embeddings using SubdictionariesΒΆ
Theorem 12.28 (Norm bounds for embeddings with real dictionaries)
Let
where
Proof. We can see that
Expanding we have
The terms in the R.H.S. for
are given bySumming over
, we getWe are now left with
off diagonal terms.Each of these terms is bounded by
Summing over the
off-diagonal terms we get:Thus,
Thus,
We get the result by slight reordering of terms:
We note that due to Theorem 12.14
Thus, the inequalities can be written as
Alternatively,
Finally, due to Theorem 12.10
This gives us
We now present the above theorem for the complex case. The proof is based on singular values. This proof is simpler and more general than the one presented above.
Theorem 12.29 (Norm bounds for embeddings with complex dictionaries)
Let
12.6.4. Babel FunctionΒΆ
Recalling the definition of coherence, we note that it reflects only the extreme correlations between atoms of dictionary. If most of the inner products are small compared to one dominating inner product, then the value of coherence is highly misleading.
In [35], Tropp introduced Babel function, which measures the maximum total coherence between a fixed atom and a collection of other atoms. The Babel function quantifies an idea as to how much the atoms of a dictionary are βspeaking the same languageβ.
Definition 12.21 (Babel function)
The Babel function for a dictionary
where the vector
for sparsity level
Let us dig deeper into what is going on here.
For each value of
Let the atoms spanning one such subspace be identified by an index set
All other atoms are indexed by the index set
denote the atoms indexed by
We run it for every
We finally compute the maximum over all possible
We first make a few observations over the properties of Babel function. Babel function is a generalization of coherence.
Remark 12.7 (Babel function for
For
the coherence of
Theorem 12.30 (Monotonicity of babel function)
Proof. This is easy to see since the sum
cannot decrease as
Theorem 12.31 (An upper bound for Babel function)
Babel function is upper bounded by coherence as per
Proof. Note that
This leads to
12.6.4.1. Computation of Babel FunctionΒΆ
It might seem at first that computation of Babel function is combinatorial and hence prohibitively expensive. But it is not true.
Example 12.26 (Procedure for computing the Babel function)
We will demonstrate this through an example in this section. Our example synthesis matrix will be
From the synthesis matrix
We then take absolute value of each entry in
We now sort every row in descending order to obtain a
new matrix
First entry in each row is now
Now look at column 2 in
Thus,
i.e. the coherence is given by the maximum in the 2nd column of
Looking carefully we can note that for
while
For the running example the Babel function values are given by
We see that Babel function stops increasing after
12.6.4.2. Babel Function and SparkΒΆ
We first note that Babel function tells something about linear independence
of the columns of
Theorem 12.32 (Linear independence of atoms and Babel function)
Let
then all selections of
Proof. We recall from the proof of Theorem 12.24 that if
then every set of
Thus if
then all selections of
This leads us to a lower bound on spark from Babel function.
Lemma 12.1 (Lower bound on spark based on Babel function)
A lower bound of spark of a dictionary
Proof. For all
Finally
An earlier version of this result also appeared in [18] theorem 6.
12.6.4.3. Babel Function and Singular ValuesΒΆ
Theorem 12.33 (Subdictionary singular value bounds from Babel function)
Let
Proof. Consider the Gram matrix
Also let
so that
The Gershgorin Disc Theorem states that every
eigenvalue of
Since
Also we note that
since there are
Thus if
This is OK since
But the eigen values of
Corollary 12.2
Let
Proof. From previous theorem we have
Since the singular values are always non-negative,
the lower bound is useful only when
Theorem 12.34 (Uncertainty principle : Babel function)
Let
Proof. If
12.6.4.4. Babel Function and Gram Matrix of SubdictionariesΒΆ
Let
Theorem 12.35 (A bound on the norms of Gram matrix)
Proof. Since
Now each row consists of a diagonal entry
and off diagonal entries.The absolute sum of all the off-diagonal entries in a row is upper bounded by
.Thus, the absolute sum of all the entries in a row is upper bounded by
. is nothing but the maximum norm of rows of .Hence
Theorem 12.36 (A bound on the norms of inverse Gram matrix)
Suppose that
Proof. Since
We can write
as where consists of off-diagonal entries in .Recall that since atoms are unit norm, hence diagonal entries in
are 1.Each row of
lists inner products between a fixed atom and other atoms (leaving 0 at the diagonal entry).Therefore
since
norm of any row is upper bounded by the babel number .Now
can be written as a Neumann seriesThus
since
.Finally
Thus
12.6.4.5. Quasi Incoherent DictionariesΒΆ
Definition 12.22 (Quasi incoherent dictionary)
When the Babel function of a dictionary grows slowly, we say that the dictionary is quasi-incoherent.