12.6. DictionariesΒΆ

In this section we review various properties associated with a dictionary D which are useful in understanding the behavior and capabilities of a dictionary.

We recall that a dictionary D consists of a finite number of unit norm vectors in CN called atoms which span the signal space CN. Atoms of the dictionary are indexed by an index set Ξ©; i.e.,

D={dΟ‰:Ο‰βˆˆΞ©}

with |Ξ©|=D and N≀D with β€–dΟ‰β€–2=1 for every atom.

The vectors x∈CN can be represented by a synthesis matrix consisting of the atoms of D by a vector a∈CD as

x=Da.

Note that we are using the same symbol D to represent the dictionary as a set of atoms as well as the corresponding synthesis matrix. We can write the matrix D consisting of its columns as

D=[d1…dD]

This shouldn’t be causing any confusion. When we write the subscript as dΟ‰ where Ο‰βˆˆΞ© we are referring to the atoms of the dictionary D indexed by the set Ξ©, while when we write the subscript as di we are referring to a column of corresponding synthesis matrix. In this case, Ξ© will simply mean the index set {1,…,D}. Obviously |Ξ©|=D holds still.

Often, we will be working with a subset of atoms in a dictionary. Usually such a subset of atoms will be indexed by an index set Ξ›βŠ†Ξ©. Ξ› will take the form of Ξ›βŠ†{Ο‰1,…,Ο‰D} or Ξ›βŠ†{1,…,D} depending upon whether we are talking about the subset of atoms in the dictionary or a subset of columns from the corresponding synthesis matrix.

Often we will need the notion of a subdictionary [37] described below.

12.6.1. SubdictionariesΒΆ

Definition 12.15 (Subdictionary)

A subdictionary is a linearly independent collection of atoms. Let Ξ›βŠ‚{Ο‰1,…,Ο‰D} be the index set for the atoms in the subdictionary. We denote the subdictionary as DΞ›. We also use DΞ› to denote the corresponding matrix with Ξ›βŠ‚{1,…,D}.

Remark 12.5 (Rank of subdictionary)

A subdictionary is full rank.

This is obvious since it is a collection of linearly independent atoms.

For subdictionaries, often we will say K=|Ξ›| and G=DΞ›HDΞ› as its Gram matrix. Sometimes, we will also be considering Gβˆ’1. Gβˆ’1 has a useful interpretation in terms of the dual vectors for the atoms in DΞ› [36].

Let {dΞ»}Ξ»βˆˆΞ› denote the atoms in DΞ›. Let {cΞ»}Ξ»βˆˆΞ› be chosen such that

⟨dλ,cλ⟩=1

and

⟨dΞ»,cΟ‰βŸ©=0 for Ξ»,Ο‰βˆˆΞ›,Ξ»β‰ Ο‰.

Each dual vector cλ is orthogonal to atoms in the subdictionary at different indices and is long enough so that its inner product with dλ is one. The dual system somehow inverts the sub-dictionary. In fact the dual vectors are nothing but the columns of the matrix B=(Dˆ)H. Now, a simple calculation shows that:

BHB=(DΛ†)(DΛ†)H=(DΞ›HDΞ›)βˆ’1DΞ›HDΞ›(DΞ›HDΞ›)βˆ’1=(DΞ›HDΞ›)βˆ’1=Gβˆ’1.

Therefore, the inverse Gram matrix lists the inner products between the dual vectors.

Sometimes we will be discussing tools which apply for general matrices. We will use the symbol Ξ¦ for representing general matrices. Whenever the dictionary is an orthonormal basis, we will use the symbol Ξ¨.

12.6.2. SparkΒΆ

Definition 12.16 (Spark)

The spark of a given matrix Ξ¦ is the smallest number of columns of Ξ¦ that are linearly dependent. If all columns are linearly independent, then the spark is defined to be number of columns plus one.

Note that the definition of spark applies to all matrices (wide, tall or square). It is not restricted to the synthesis matrices for a dictionary.

Correspondingly, the spark of a dictionary is defined as the minimum number of atoms which are linearly dependent.

We recall that rank of a matrix is defined as the maximum number of columns which are linearly independent. Definition of spark bears remarkable resemblance yet its very hard to obtain as it requires a combinatorial search over all possible subsets of columns of Ξ¦.

Example 12.21 (Spark)

  1. Spark of the 3Γ—3 identity matrix

    (100010001)

    is 4 since all columns are linearly independent.

  2. Spark of the 2Γ—4 matrix

    (10βˆ’10010βˆ’1)

    is 2 since column 1 and 3 are linearly dependent.

  3. If a matrix has a column with all zero entries, then the spark of such a matrix is 1. This is a trivial case and we will not consider such matrices in the sequel.

  4. In general for an NΓ—D synthesis matrix, spark⁑(D)∈[2,N+1].

A naive combinatorial algorithm to calculate the spark of a matrix is given below.

12.6.2.1. Spark and NullspaceΒΆ

Remark 12.6 (Spark and sparsity of null space vectors)

The β„“0-β€œnorm” of vectors belonging to null space of a matrix Ξ¦ is greater than or equal to spark⁑(Ξ¦):

β€–xβ€–0β‰₯spark⁑(Ξ¦)βˆ€x∈N(Ξ¦).

Proof. We proceed as follows:

  1. If x∈N(Φ) then Φx=0.

  2. Thus non-zero entries in x pick a set of columns in Ξ¦ which are linearly dependent.

  3. Clearly β€–xβ€–0 indicates the number of columns in the set which are linearly dependent.

  4. By definition, spark of Ξ¦ indicates the minimum number of columns which are linearly dependent.

  5. Hence the result:

    β€–xβ€–0β‰₯spark⁑(Ξ¦)βˆ€x∈N(Ξ¦).

12.6.2.2. Uniqueness-SparkΒΆ

Spark is useful in characterizing the uniqueness of the solution of a (D,K)-EXACT-SPARSE problem (see Definition 12.7). We now present a criteria based on spark which characterizes the uniqueness of a sparse solution to the problem y=Ξ¦x.

Theorem 12.22 (Uniqueness of a sparse solution for an underdetermined system via spark)

Consider a solution xβˆ— to the underdetermined system y=Ξ¦x. If xβˆ— obeys

β€–xβˆ—β€–0<spark⁑(Ξ¦)2

then it is necessarily the sparsest solution.

Proof. Let xβ€² be some other solution to the problem. Then

Ξ¦xβ€²=Ξ¦xβˆ—βŸΉΞ¦(xβ€²βˆ’xβˆ—)=0⟹(xβ€²βˆ’xβˆ—)∈N(Ξ¦).

Due to Remark 12.6 we have

β€–xβ€²βˆ’xβˆ—β€–0β‰₯spark⁑(Ξ¦).

Now

β€–xβ€²β€–0+β€–xβˆ—β€–0β‰₯β€–xβ€²βˆ’xβˆ—β€–0β‰₯spark⁑(Ξ¦).

Hence, if β€–xβˆ—β€–0<spark⁑(Ξ¦)2, then we have

β€–xβ€²β€–0>spark⁑(Ξ¦)2

for any other solution xβ€² to the equation y=Ξ¦x. Thus xβˆ— is necessarily the sparsest possible solution.

This result is quite useful as it establishes a global optimality criterion for the (D,K)-EXACT-SPARSE problem.

As long as K<12spark⁑(Φ) this theorem guarantees that the solution to (D,K)-EXACT-SPARSE problem is unique. This is quite surprising result for a non-convex combinatorial optimization problem. We are able to guarantee a global uniqueness for the solution based on a simple check on the sparsity of the solution.

Note that we are only saying that if a sufficiently sparse solution is found then it is unique. We are not claiming that it is possible to find such a solution.

Obviously, the larger the spark, we can guarantee uniqueness for signals with higher sparsity levels. So a natural question is: How large can spark of a dictionary be? We consider few examples.

Example 12.22 (Spark of Gaussian dictionaries)

Consider a dictionary D whose atoms di are random vectors independently drawn from normal distribution.

  1. Since a dictionary requires all its atoms to be unit-norms, hence we divide the each of the random vectors with their norms.

  2. We know that with probability 1 any set of N independent Gaussian random vectors is linearly independent.

  3. Also, since di∈RN hence a set of N+1 atoms is always linearly dependent.

  4. Thus spark⁑(D)=N+1.

Thus, if a solution to EXACT-SPARSE problem contains N2 or fewer non-zero entries then it is necessarily unique with probability 1.

Example 12.23 (Spark of Dirac Fourier basis)

For

D=[IF]∈CNΓ—2N

it can be shown that

spark⁑(D)=2N.

In this case, the sparsity level of a unique solution must be less than N.

12.6.3. CoherenceΒΆ

Finding out the spark of a dictionary D is NP-hard since it involves considering combinatorially large number of selections of columns from D. In this section we consider the coherence of a dictionary which is computationally tractable and quite useful in characterizing the solutions of sparse approximation problems.

Definition 12.17 (Coherence of a dictionary)

The coherence of a dictionary D is defined as the maximum absolute inner product between two distinct atoms in the dictionary:

ΞΌ=maxjβ‰ k|⟨dΟ‰j,dΟ‰k⟩|=maxjβ‰ k|(DHD)jk|.

If the dictionary consists of two orthonormal bases, then coherence is also known as mutual coherence or proximity; see Definition 12.1.

We note that dωi is the i-th column of synthesis matrix D. Also DHD is the Gram matrix for D whose elements are nothing but the inner-products of columns of D.

We note that by definition β€–dΟ‰β€–2=1 hence μ≀1 and since absolute values are considered hence ΞΌβ‰₯0. Thus, 0≀μ≀1.

For an orthonormal basis Ξ¨ all atoms are orthogonal to each other, hence

|βŸ¨ΟˆΟ‰j,ΟˆΟ‰k⟩|=0 whenever jβ‰ k.

Thus ΞΌ=0 for an orthonormal basis.

In the following, we will use the notation |A| to denote a matrix consisting of absolute values of entries in a matrix A; i.e.,

|A|ij=|Aij|.

The off-diagonal entries of the Gram matrix are captured by the matrix DHDβˆ’I. Note that all diagonal entries in DHDβˆ’I are zero since atoms of D are unit norm. Moreover, each of the entries in |DHDβˆ’I| is dominated by ΞΌ(D).

The inner product between any two atoms |⟨dΟ‰j,dΟ‰k⟩| is a measure of how much they look alike or how much they are correlated. Coherence just picks up the two vectors which are most alike and returns their correlation. In a way ΞΌ is quite a blunt measure of the quality of a dictionary, yet it is quite useful.

If a dictionary is uniform in the sense that there is not much variation in |⟨dΟ‰j,dΟ‰k⟩|, then ΞΌ captures the behavior of the dictionary quite well.

Definition 12.18 (Incoherent dictionary)

We say that a dictionary is incoherent if the coherence of the dictionary is small.

We are looking for dictionaries which are incoherent. In the sequel we will see how incoherence plays a role in sparse approximation.

Example 12.24 (Coherence of two ortho bases)

We established in Theorem 12.3 that coherence of two ortho-bases is bounded by

1N≀μ≀1.

In particular we showed in Theorem 12.4 that coherence of Dirac Fourier basis is 1N.

Example 12.25 (Coherence: Multi-ONB dictionary)

A dictionary of concatenated orthonormal bases is called a multi-ONB. For some N, it is possible to build a multi-ONB which contains N or even N+1 bases yet retains the minimal coherence ΞΌ=1N possible.

Theorem 12.23 (Coherence lower bound)

A lower bound on the coherence of a general dictionary is given by

ΞΌβ‰₯Dβˆ’NN(Dβˆ’1).

Definition 12.19 (Grassmannian frame)

If each atomic inner product meets this bound, the dictionary is called an optimal Grassmannian frame.

The definition of coherence can be extended to arbitrary matrices Φ∈CNΓ—D.

Definition 12.20 (Coherence for arbitrary matrices)

The coherence of a matrix Φ∈CNΓ—D is defined as the maximum absolute normalized inner product between two distinct columns in the matrix. Let

Ξ¦=[Ο•1Ο•2…ϕD].

Then coherence of Ξ¦ is given by

(12.31)ΒΆΞΌ(Ξ¦)=maxjβ‰ k|βŸ¨Ο•j,Ο•k⟩|β€–Ο•jβ€–2β€–Ο•kβ€–2

It is assumed that none of the columns in Ξ¦ is a zero vector.

12.6.3.1. Lower Bounds for SparkΒΆ

Coherence of a matrix is easy to compute. More interestingly it also provides a lower bound on the spark of a matrix.

Theorem 12.24 (Lower bound on spark in terms of coherence)

For any matrix Φ∈CNΓ—D (with non-zero columns) the following relationship holds

spark⁑(Ξ¦)β‰₯1+1ΞΌ(Ξ¦).

Proof. We note that scaling of a column of Ξ¦ doesn’t change either the spark or coherence of Ξ¦. Therefore, we assume that the columns of Ξ¦ are normalized.

  1. We now construct the Gram matrix of Φ given by G=ΦHΦ.

  2. We note that

    Gkk=1βˆ€1≀k≀D

    since each column of Ξ¦ is unit norm.

  3. Also

    |Gkj|≀μ(Ξ¦)βˆ€1≀k,j≀D,kβ‰ j.
  4. Consider any p columns from Ξ¦ and construct its Gram matrix.

  5. This is nothing but a leading minor of size pΓ—p from the matrix G.

  6. From the Gershgorin disk theorem, if this minor is diagonally dominant, i.e. if

    βˆ‘jβ‰ i|Gij|<|Gii|βˆ€i

    then this sub-matrix of G is positive definite and so corresponding p columns from Ξ¦ are linearly independent.

  7. But

    |Gii|=1

    and

    βˆ‘jβ‰ i|Gij|≀(pβˆ’1)ΞΌ(Ξ¦)

    for the minor under consideration.

  8. Hence for p columns to be linearly independent the following condition is sufficient

    (pβˆ’1)ΞΌ(Ξ¦)<1.
  9. Thus if

    p<1+1ΞΌ(Ξ¦),

    then every set of p columns from Ξ¦ is linearly independent.

  10. Hence, the smallest possible set of linearly dependent columns must satisfy

    pβ‰₯1+1ΞΌ(Ξ¦).
  11. This establishes the lower bound that

    spark⁑(Ξ¦)β‰₯1+1ΞΌ(Ξ¦).

This bound on spark doesn’t make any assumptions on the structure of the dictionary. In fact, imposing additional structure on the dictionary can give better bounds. Let us look at an example for a two ortho-basis [18].

Theorem 12.25 (Lower bound on spark for two ortho bases)

Let D be a two ortho-basis. Then

spark⁑(D)β‰₯2ΞΌ(D).

Proof. From Theorem 12.6 we know that for any vector v∈N(D)

β€–vβ€–0β‰₯2ΞΌ(D).

But

spark⁑(D)=minv∈N(D)(β€–vβ€–0).

Thus

spark⁑(D)β‰₯2ΞΌ(D).

For maximally incoherent two orthonormal bases, we know that ΞΌ=1N. A perfect example is the pair of Dirac and Fourier bases. In this case spark⁑(D)β‰₯2N.

12.6.3.2. Uniqueness-CoherenceΒΆ

We can now establish a uniqueness condition for sparse solution of y=Ξ¦x.

Theorem 12.26 (Uniqueness of a sparse solution of an underdetermined system via coherence)

Consider a solution xβˆ— to the under-determined system y=Ξ¦x. If xβˆ— obeys

β€–xβˆ—β€–0<12(1+1ΞΌ(Ξ¦))

then it is necessarily the sparsest solution.

Proof. This is a straightforward application of Theorem 12.22 and Theorem 12.24.

It is interesting to compare the two uniqueness theorems: Theorem 12.22 and Theorem 12.26.

Theorem 12.22 uses spark, is sharp and is far more powerful than Theorem 12.26.

Coherence can never be smaller than 1N, therefore the bound on β€–xβˆ—β€–0 in Theorem 12.26 can never be larger than N+12.

However, spark can be easily as large as N and then bound on β€–xβˆ—β€–0 can be as large as N2.

We recall from Theorem 12.8 that the bound for sparsity level of sparest solution in two-ortho basis H=[Ξ¨X] is given by

β€–xβˆ—β€–0<1ΞΌ(H)

which is a larger bound than Theorem 12.26 for general dictionaries by a factor of 2.

Thus, we note that coherence gives a weaker bound than spark for supportable sparsity levels of unique solutions. The advantage that coherence has is that it is easily computable and doesn’t require any special structure on the dictionary (two ortho basis has a special structure).

12.6.3.3. Singular Values of SubdictionariesΒΆ

Theorem 12.27 (Singular values of subdictionaries and coherence)

Let D be a dictionary and DΞ› be a subdictionary. Let ΞΌ be the coherence of D. Let K=|Ξ›|. Then the eigen values of G=DΞ›HDΞ› satisfy:

1βˆ’(Kβˆ’1)μ≀λ≀1+(Kβˆ’1)ΞΌ.

Moreover, the singular values of the sub-dictionary DΞ› satisfy

1βˆ’(Kβˆ’1)μ≀σ(DΞ›)≀1+(Kβˆ’1)ΞΌ.

Proof. We recall from Gershgorin’s circle theorem that for any square matrix A∈CKΓ—K, every eigen value Ξ» of A satisfies

|Ξ»βˆ’aii|β‰€βˆ‘jβ‰ i|aij| for some i∈{1,…,K}.
  1. Now consider the matrix G=DΞ›HDΞ› with diagonal elements equal to 1 and off diagonal elements bounded by the coherence ΞΌ.

  2. Then

    |Ξ»βˆ’1|β‰€βˆ‘jβ‰ i|Gij|β‰€βˆ‘jβ‰ iΞΌ=(Kβˆ’1)ΞΌ.
  3. Thus,

    βˆ’(Kβˆ’1)ΞΌβ‰€Ξ»βˆ’1≀(Kβˆ’1)μ⟺1βˆ’(Kβˆ’1)μ≀λ≀1+(Kβˆ’1)ΞΌ.
  4. This gives us a lower bound on the smallest eigen value.

    Ξ»min(G)β‰₯1βˆ’(Kβˆ’1)ΞΌ.
  5. Since G is positive definite (DΞ› is full-rank), hence its eigen values are positive. Thus, the above lower bound is useful only if

    1βˆ’(Kβˆ’1)ΞΌ>0⟺1>(Kβˆ’1)μ⟺μ<1Kβˆ’1.
  6. We also get an upper bound on the eigen values of G given by

    Ξ»max(G)≀1+(Kβˆ’1)ΞΌ.
  7. The bounds on singular values of DΞ› are obtained as a straight-forward extension by taking square roots on the expressions.

12.6.3.4. Embeddings using SubdictionariesΒΆ

Theorem 12.28 (Norm bounds for embeddings with real dictionaries)

Let D be a real dictionary and DΞ› be a subdictionary with K=|Ξ›|. Let ΞΌ be the coherence of D. Let v∈RK be an arbitrary vector. Then

|v|T[Iβˆ’ΞΌ(1βˆ’I)]|v|≀‖DΞ›vβ€–22≀|v|T[I+ΞΌ(1βˆ’I)]|v|

where 1 is a KΓ—K matrix of all ones. Moreover

(1βˆ’(Kβˆ’1)ΞΌ)β€–vβ€–22≀‖DΞ›vβ€–22≀(1+(Kβˆ’1)ΞΌ)β€–vβ€–22.

Proof. We can see that

β€–DΞ›vβ€–22=vTDΞ›TDΞ›v.
  1. Expanding we have

    vTDΞ›TDΞ›v=βˆ‘i=1Kβˆ‘j=1KvidΞ»iTdΞ»jvj.
  2. The terms in the R.H.S. for i=j are given by

    vidΞ»iTdΞ»ivi=|vi|2.
  3. Summing over i=1,…,K, we get

    βˆ‘i=1K|vi|2=β€–vβ€–22=vTv=|v|T|v|=|v|TI|v|.
  4. We are now left with K2βˆ’K off diagonal terms.

  5. Each of these terms is bounded by

    βˆ’ΞΌ|vi||vj|≀vidΞ»iTdΞ»jvj≀μ|vi||vj|.
  6. Summing over the K2βˆ’K off-diagonal terms we get:

    βˆ‘iβ‰ j|vi||vj|=βˆ‘i,j|vi||vj|βˆ’βˆ‘i=j|vi||vj|=|v|T(1βˆ’I)|v|.
  7. Thus,

    βˆ’ΞΌ|v|T(1βˆ’I)|v|β‰€βˆ‘iβ‰ jvidΞ»iTdΞ»jvj≀μ|v|T(1βˆ’I)|v|.
  8. Thus,

    |v|TI|v|βˆ’ΞΌ|v|T(1βˆ’I)|v|≀vTDΞ›TDΞ›v≀|v|TI|v|+ΞΌ|v|T(1βˆ’I)|v|.
  9. We get the result by slight reordering of terms:

    |v|T[Iβˆ’ΞΌ(1βˆ’I)]|v|≀‖DΞ›vβ€–22≀|v|T[I+ΞΌ(1βˆ’I)]|v|.
  10. We note that due to Theorem 12.14

    |v|T1|v|=β€–vβ€–12.
  11. Thus, the inequalities can be written as

    (1+ΞΌ)β€–vβ€–22βˆ’ΞΌβ€–vβ€–12≀‖DΞ›vβ€–22≀(1βˆ’ΞΌ)β€–vβ€–22+ΞΌβ€–vβ€–12.
  12. Alternatively,

    β€–vβ€–22βˆ’ΞΌ(β€–vβ€–12βˆ’β€–vβ€–22)≀‖DΞ›vβ€–22≀‖vβ€–22+ΞΌ(β€–vβ€–12βˆ’β€–vβ€–22).
  13. Finally, due to Theorem 12.10

    β€–vβ€–12≀Kβ€–vβ€–22βŸΉβ€–vβ€–12βˆ’β€–vβ€–22≀(Kβˆ’1)β€–vβ€–22.
  14. This gives us

    (1βˆ’(Kβˆ’1)ΞΌ)β€–vβ€–22≀‖DΞ›vβ€–22≀(1+(Kβˆ’1)ΞΌ)β€–vβ€–22.

We now present the above theorem for the complex case. The proof is based on singular values. This proof is simpler and more general than the one presented above.

Theorem 12.29 (Norm bounds for embeddings with complex dictionaries)

Let D be a dictionary and DΞ› be a sub-dictionary with K=|Ξ›|. Let ΞΌ be the coherence of D. Let v∈CK be an arbitrary vector. Then

(1βˆ’(Kβˆ’1)ΞΌ)β€–vβ€–22≀‖DΞ›vβ€–22≀(1+(Kβˆ’1)ΞΌ)β€–vβ€–22.

Proof. Recall that

Οƒmin2(DΞ›)β€–vβ€–22≀‖DΞ›vβ€–22≀σmax2(DΞ›)β€–vβ€–22.

The Theorem 12.27 tells us:

1βˆ’(Kβˆ’1)μ≀σ2(DΞ›)≀1+(Kβˆ’1)ΞΌ.

Thus,

Οƒmin2(DΞ›)β€–vβ€–22β‰₯(1βˆ’(Kβˆ’1)ΞΌ)β€–vβ€–22

and

Οƒmax2(DΞ›)β€–vβ€–22≀(1+(Kβˆ’1)ΞΌ)β€–vβ€–22.

This gives us the result

(1βˆ’(Kβˆ’1)ΞΌ)β€–vβ€–22≀‖DΞ›vβ€–22≀(1+(Kβˆ’1)ΞΌ)β€–vβ€–22.

12.6.4. Babel FunctionΒΆ

Recalling the definition of coherence, we note that it reflects only the extreme correlations between atoms of dictionary. If most of the inner products are small compared to one dominating inner product, then the value of coherence is highly misleading.

In [35], Tropp introduced Babel function, which measures the maximum total coherence between a fixed atom and a collection of other atoms. The Babel function quantifies an idea as to how much the atoms of a dictionary are β€œspeaking the same language”.

Definition 12.21 (Babel function)

The Babel function for a dictionary D is defined by

(12.32)ΒΆΞΌ1(p)β‰œmax|Ξ›|=pmaxΟˆβˆ‘Ξ»βˆˆΞ›|⟨ψ,dλ⟩|,

where the vector ψ ranges over the atoms indexed by Ξ©βˆ–Ξ›. We define

ΞΌ1(0)=0

for sparsity level p=0.

Let us dig deeper into what is going on here. For each value of p we consider all possible (Dp) subspaces by choosing p vectors from D.

Let the atoms spanning one such subspace be identified by an index set Ξ›βŠ‚Ξ©.

All other atoms are indexed by the index set Ξ“=Ξ©βˆ–Ξ›. Let

Ξ¨={ψγ|Ξ³βˆˆΞ“}

denote the atoms indexed by Ξ“. We pickup a vector ψ∈Ψ and compute its inner product with all atoms indexed by Ξ›. We compute the sum of absolute value of these inner products over all {dΞ»:Ξ»βˆˆΞ›}.

We run it for every ψ∈Ψ and then pickup the maximum value of above sum over all ψ.

We finally compute the maximum over all possible p-subspaces. This number is considered at the Babel number for sparsity level p.

We first make a few observations over the properties of Babel function. Babel function is a generalization of coherence.

Remark 12.7 (Babel function for p=1)

For p=1 we observe that

ΞΌ1(1)=ΞΌ(D)

the coherence of D.

Theorem 12.30 (Monotonicity of babel function)

ΞΌ1 is a non-decreasing function of p.

Proof. This is easy to see since the sum

βˆ‘Ξ»βˆˆΞ›|⟨ψ,dλ⟩|

cannot decrease as p=|Ξ›| increases. The following argument provides the details.

  1. For some value of p let Ξ›p and ψp denote the set and vector for which the maximum in (12.32) is attained.

  2. Now pick some column which is not ψp and is not indexed by Ξ›p and include it for Ξ›p+1.

  3. Note that Ξ›p+1 and ψp might not be the maximizers for ΞΌ1 for sparsity level p+1 in (12.32).

  4. Clearly

    βˆ‘Ξ»βˆˆΞ›p+1|⟨ψp,dλ⟩|β‰₯βˆ‘Ξ»βˆˆΞ›p|⟨ψp,dλ⟩|.
  5. Hence ΞΌ1(p+1) cannot be less than ΞΌ1(p).

Theorem 12.31 (An upper bound for Babel function)

Babel function is upper bounded by coherence as per

ΞΌ1(p)≀pΞΌ(D).

Proof. Note that

βˆ‘Ξ»βˆˆΞ›|⟨ψ,dλ⟩|≀pΞΌ(D).

This leads to

ΞΌ1(p)=max|Ξ›|=pmaxΟˆβˆ‘Ξ»βˆˆΞ›|⟨ψ,dλ⟩|≀max|Ξ›|=pmaxψ(pΞΌ(D))=pΞΌ(D).

12.6.4.1. Computation of Babel FunctionΒΆ

It might seem at first that computation of Babel function is combinatorial and hence prohibitively expensive. But it is not true.

Example 12.26 (Procedure for computing the Babel function)

We will demonstrate this through an example in this section. Our example synthesis matrix will be

D=[0.5000.653310.5βˆ’0.270600.5100.27060βˆ’0.50.653300.501βˆ’0.27060βˆ’0.5βˆ’0.653300.500βˆ’0.653300.50.27061]

From the synthesis matrix D we first construct its Gram matrix given by

G=DHD.

We then take absolute value of each entry in G to construct |G|. For the running example

|G|=[10.50.500.5000.50.5100.270600.50.653300.5010.270600.50.6533000.27060.270610.6533000.65330.5000.653310.50.2706000.50.500.5100.500.65330.653300.2706010.27060.5000.653300.50.27061]

We now sort every row in descending order to obtain a new matrix Gβ€².

Gβ€²=[10.50.50.50.500010.65330.50.50.270600010.65330.50.50.270600010.65330.65330.27060.270600010.65330.50.50.270600010.50.50.50.500010.65330.65330.27060.270600010.65330.50.50.2706000]

First entry in each row is now 1. This corresponds to ⟨di,di⟩ and it doesn’t appear in the calculation of ΞΌ1(p). Hence we disregard whole of first column.

Now look at column 2 in Gβ€². In the i-th row it is nothing but

maxjβ‰ i|⟨di,dj⟩|.

Thus,

ΞΌ(D)=ΞΌ1(1)=max1≀j≀DGβ€²j,2

i.e. the coherence is given by the maximum in the 2nd column of Gβ€². In the running example

ΞΌ(D)=ΞΌ1(1)=0.6533.

Looking carefully we can note that for ψ=di the maximum value of sum

βˆ‘Ξ›|⟨ψ,dλ⟩|

while |Ξ›|=p is given by the sum over elements from 2nd to (p+1)-th columns in i-th row. Thus

ΞΌ1(p)=max1≀i≀Dβˆ‘j=2p+1Gijβ€².

For the running example the Babel function values are given by

(0.65331.30661.65332222).

We see that Babel function stops increasing after p=4. Actually D is constructed by shuffling the columns of two orthonormal bases. Hence many of the inner products are 0 in G.

12.6.4.2. Babel Function and SparkΒΆ

We first note that Babel function tells something about linear independence of the columns of D.

Theorem 12.32 (Linear independence of atoms and Babel function)

Let ΞΌ1 be the Babel function for a dictionary D. If

ΞΌ1(p)<1

then all selections of p+1 columns from D are linearly independent.

Proof. We recall from the proof of Theorem 12.24 that if

p+1<1+1μ(D)⟹p<1μ(D)

then every set of (p+1) columns from D are linearly independent. We also know from Theorem 12.31 that

pΞΌ(D)β‰₯ΞΌ1(p)⟹μ(D)β‰₯ΞΌ1(p)p⟹1ΞΌ(D)≀pΞΌ1(p).

Thus if

p<pμ1(p)⟹1<1μ1(p)⟹μ1(p)<1

then all selections of p+1 columns from D are linearly independent.

This leads us to a lower bound on spark from Babel function.

Lemma 12.1 (Lower bound on spark based on Babel function)

A lower bound of spark of a dictionary D is given by

spark⁑(D)β‰₯min1≀p≀N{p|ΞΌ1(pβˆ’1)β‰₯1}.

Proof. For all j≀pβˆ’2 we are given that ΞΌ1(j)<1. Thus all sets of pβˆ’1 columns from D are linearly independent (using Theorem 12.32).

Finally ΞΌ1(pβˆ’1)β‰₯1, hence we cannot say definitively whether a set of p columns from D is linearly dependent or not. This establishes the lower bound on spark.

An earlier version of this result also appeared in [18] theorem 6.

12.6.4.3. Babel Function and Singular ValuesΒΆ

Theorem 12.33 (Subdictionary singular value bounds from Babel function)

Let D be a dictionary and Ξ› be an index set with |Ξ›|=K. The singular values of DΞ› are bounded by

1βˆ’ΞΌ1(Kβˆ’1)≀σ2≀1+ΞΌ1(Kβˆ’1).

Proof. Consider the Gram matrix

G=DΞ›HDΞ›.

G is a KΓ—K square matrix.

Also let

Ξ›={Ξ»1,Ξ»2,…,Ξ»K}

so that

DΞ›=[dΞ»1dΞ»2…dΞ»K].

The Gershgorin Disc Theorem states that every eigenvalue of G lies in one of the K discs

Ξ”k={z||zβˆ’Gkk|β‰€βˆ‘jβ‰ k|Gjk|}

Since di are unit norm, hence Gkk=1.

Also we note that

βˆ‘jβ‰ k|Gjk|=βˆ‘jβ‰ k|⟨dΞ»j,dΞ»k⟩|≀μ1(Kβˆ’1)

since there are Kβˆ’1 terms in sum and ΞΌ1(Kβˆ’1) is an upper bound on all such sums.

Thus if z is an eigen value of G then we have

|zβˆ’1|≀μ1(Kβˆ’1)βŸΉβˆ’ΞΌ1(Kβˆ’1)≀zβˆ’1≀μ1(Kβˆ’1)⟹1βˆ’ΞΌ1(Kβˆ’1)≀z≀1+ΞΌ1(Kβˆ’1).

This is OK since G is positive semi-definite, thus the eigen values of G are real.

But the eigen values of G are nothing but the squared singular values of DΞ›. Thus we get

1βˆ’ΞΌ1(Kβˆ’1)≀σ2≀1+ΞΌ1(Kβˆ’1).

Corollary 12.2

Let D be a dictionary and Ξ› be an index set with |Ξ›|=K. If ΞΌ1(Kβˆ’1)<1 then the squared singular values of DΞ› exceed (1βˆ’ΞΌ1(Kβˆ’1)).

Proof. From previous theorem we have

1βˆ’ΞΌ1(Kβˆ’1)≀σ2≀1+ΞΌ1(Kβˆ’1).

Since the singular values are always non-negative, the lower bound is useful only when ΞΌ1(Kβˆ’1)<1. When it holds we have

Οƒ(DΞ›)β‰₯1βˆ’ΞΌ1(Kβˆ’1).

Theorem 12.34 (Uncertainty principle : Babel function)

Let ΞΌ1(Kβˆ’1)<1. If a signal can be written as a linear combination of k atoms, then any other exact representation of the signal requires at least (Kβˆ’k+1) atoms.

Proof. If ΞΌ1(Kβˆ’1)<1, then the singular values of any sub-matrix of K atoms are non-zero. Thus, the minimum number of atoms required to form a linear dependent set is K+1. Let the number of atoms in any other exact representation of the signal be l. Then

k+lβ‰₯K+1⟹lβ‰₯Kβˆ’k+1.

12.6.4.4. Babel Function and Gram Matrix of SubdictionariesΒΆ

Let Ξ› index a subdictionary and let G=DΞ›HDΞ› denote the Gram matrix of the subdictionary DΞ›. Assume K=|Ξ›|.

Theorem 12.35 (A bound on the norms of Gram matrix)

β€–Gβ€–βˆž=β€–Gβ€–1≀1+ΞΌ1(Kβˆ’1).

Proof. Since G is Hermitian, hence the two norms are equal:

β€–Gβ€–βˆž=β€–GHβ€–1=β€–Gβ€–1.
  1. Now each row consists of a diagonal entry 1 and Kβˆ’1 off diagonal entries.

  2. The absolute sum of all the off-diagonal entries in a row is upper bounded by ΞΌ1(Kβˆ’1).

  3. Thus, the absolute sum of all the entries in a row is upper bounded by 1+ΞΌ1(Kβˆ’1).

  4. β€–Gβ€–βˆž is nothing but the maximum β„“1 norm of rows of G.

  5. Hence

    β€–Gβ€–βˆžβ‰€1+ΞΌ1(Kβˆ’1).

Theorem 12.36 (A bound on the norms of inverse Gram matrix)

Suppose that ΞΌ1(Kβˆ’1)<1. Then

β€–Gβˆ’1β€–βˆž=β€–Gβˆ’1β€–1≀11βˆ’ΞΌ1(Kβˆ’1).

Proof. Since G is Hermitian, hence the two operator norms are equal:

β€–Gβˆ’1β€–βˆž=β€–Gβˆ’1β€–1.
  1. We can write G as G=I+A where A consists of off-diagonal entries in G.

  2. Recall that since atoms are unit norm, hence diagonal entries in G are 1.

  3. Each row of A lists inner products between a fixed atom and Kβˆ’1 other atoms (leaving 0 at the diagonal entry).

  4. Therefore

    β€–Aβ€–βˆžβ‰€ΞΌ1(Kβˆ’1).

    since β„“1 norm of any row is upper bounded by the babel number ΞΌ1(Kβˆ’1).

  5. Now Gβˆ’1 can be written as a Neumann series

    Gβˆ’1=βˆ‘k=0∞(βˆ’A)k.
  6. Thus

    β€–Gβˆ’1β€–βˆž=β€–βˆ‘k=0∞(βˆ’A)kβ€–βˆžβ‰€βˆ‘k=0βˆžβ€–(βˆ’A)kβ€–βˆž=βˆ‘k=0βˆžβ€–Aβ€–βˆžk=11βˆ’β€–Aβ€–βˆž

    since β€–Aβ€–βˆž<1.

  7. Finally

    β€–Aβ€–βˆžβ‰€ΞΌ1(Kβˆ’1)⟺1βˆ’β€–Aβ€–βˆžβ‰₯1βˆ’ΞΌ1(Kβˆ’1)⟺11βˆ’β€–Aβ€–βˆžβ‰€11βˆ’ΞΌ1(Kβˆ’1).
  8. Thus

    β€–Gβˆ’1β€–βˆžβ‰€11βˆ’ΞΌ1(Kβˆ’1).

12.6.4.5. Quasi Incoherent DictionariesΒΆ

Definition 12.22 (Quasi incoherent dictionary)

When the Babel function of a dictionary grows slowly, we say that the dictionary is quasi-incoherent.