14.2. Basis PursuitΒΆ

14.2.1. IntroductionΒΆ

We recall following sparse approximation problems. Given a signal x∈CN which is known to have a sparse representation in a dictionary D, the exact-sparse recovery problem is:

(14.4)ΒΆa^=arg mina∈CDβ€–aβ€–0 subject to x=Da.

When x∈CN doesn’t have a sparse representation in D, a K-sparse approximation of x in D can be obtained by solving the following problem:

(14.5)ΒΆa^=arg mina∈CDβ€–xβˆ’Daβ€–2 subject to β€–aβ€–0≀K.

Here x is modeled as x=Da+e where a denotes a sparse representation of x and e denotes the approximation error.

A different way to formulate the approximation problem is to provide an upper bound to the acceptable approximation error β€–eβ€–2≀ϡ and try to find sparsest possible representation within this approximation error bound as

(14.6)ΒΆa^=arg mina∈CDβ€–aβ€–0 subject to β€–xβˆ’Daβ€–2≀ϡ.

Basis Pursuit (BP) [16] suggests the convex relaxation of (14.4) by replacing β„“0-``norm” with β„“1-norm.

(14.7)ΒΆa^=arg mina∈CDβ€–aβ€–1 subject to x=Da.

For real signals, it can be implemented as a linear program. For complex signals, it can be implemented as a second order cone program.

In the presence of approximation error (14.6), where x=Da+e with a being a K-sparse approximate representation of x in D we can formulate corresponding β„“1-minimization problem as:

(14.8)ΒΆa^=arg mina∈CDβ€–aβ€–1 subject to β€–xβˆ’Daβ€–2≀ϡ

where Ο΅β‰₯β€–eβ€–2 provides an upper bound on the approximation error. This version is known as basis pursuit with inequality constraints (BPIC).

The dual problem constructed using Lagrange multipliers is

(14.9)ΒΆa^=arg mina∈CDβ€–aβ€–1+Ξ»β€–xβˆ’Daβ€–22.

This is known as basis pursuit denoising(BPDN). With appropriate choice of Ξ», the two problems BPIC and BPDN are equivalent. This formulation attempts to minimize the β„“1-norm subject to a penalty term over the approximation error. The Lagrangian constant Ξ» controls how large the penalty due to approximation error will be.

Note that the constraint β€–xβˆ’Daβ€–2≀ϡ is equivalent to β€–xβˆ’Daβ€–22≀ϡ2. We have used the squared version to construct the dual BPDN problem since the term β€–xβˆ’Daβ€–22 is easier to differentiate and work with.

Efficient solvers are available to solve BP, BPIC, BPDN problems using convex optimization techniques. They are usually polynomial time and involve sophisticated algorithms for implementation. The good part is a guarantee that a globally unique solution can be found (since the problem is convex). The not so good part is that convex optimization methods are still quite computationally intensive.

An alternative formulation of BPDN is as follows.

(14.10)ΒΆa^=arg mina∈CD12β€–xβˆ’Daβ€–22+Ξ³β€–aβ€–1.

The difference in the two formulations is essentially with which term the Lagrangian constant (Ξ» or Ξ³) is placed. By choosing Ξ»=1/(2Ξ³), the two formulations are essentially the same (with a scale factor in the objective function). This formulation attempts to minimize the approximation error subject to an β„“1-norm penalty. Thus, the two formulations differentiate w.r.t. which term is minimized and which term is considered as penalty.

Basis pursuit is not an algorithm but a principle which says that for most real life problems, the solution of β„“0-minimization problem is same as the solution of the corresponding β„“1-minimization problem. Actual algorithms for solving the basis pursuit formulation of sparse recovery problem come from convex optimization literature.

We start our discussion with the analysis of exact-sparse case.

As part of our theoretical analysis, we would like to explore conditions under which the problems (14.4) and (14.7) are equivalent i.e. there exists a unique solution to both of them and the solution is identical. Under such conditions, the NP-hard problem (14.4) can be easily replaced with a tractable (14.7) problem which is convex and solvable in polynomial time.

14.2.2. Two-Ortho-CaseΒΆ

Further simplifying, we consider the case where the dictionary D is a two-ortho-basis

D=[ΨΦ]

with Ξ¨ and Ξ¦ both being orthonormal bases for CN.

  1. Clearly, D∈CNΓ—2N and D=2N.

  2. We denote

    Ξ©={1,2,…,2N}

    as the index set for the representation vectors a.

  3. The representation a of a signal x in D can be written as

    x=Da=[ΨΦ][apaq]=Ψap+Φaq.
  4. We can assign

    kp=β€–apβ€–0andkq=β€–aqβ€–0.
  5. Total sparsity of a is given by

    K=β€–aβ€–0=kp+kq.
  6. Whenever Kβ‰ͺN, we have a sparse representation.

  7. Further, let SpβŠ†{1,…,N} be the support corresponding to ap part of a (i.e. Sp=supp(ap)) and SqβŠ†{1,…,N} be the support corresponding to aq part of a (i.e. Sq=supp(aq)).

  8. Clearly, |Sp|=kp and |Sq|=kq.

  9. Note that Sp and Sq need not be disjoint.

  10. But, Sp and Sq+N are disjoint.

  11. In fact, supp(a)=Spβˆͺ(Sq+N).

  12. 1p∈CN will denote the indicator vector for Sp; i.e., 1p(i)=0βˆ€iβˆ‰Sp and 1p(i)=1βˆ€i∈Sp.

  13. Similarly, 1q∈CN will denote the indicator vector for Sq.

  14. 1∈CN will denote the vector {1,…,1}.

  15. Also, 1∈CNΓ—N will denote a square matrix of all ones.

  16. Note that 1=1β‹…1T.

We now state our main result for equivalence of solutions of (14.4) and (14.7) for the two ortho-case. Going forward, we will simply use ΞΌ to refer to the coherence of D (i.e. ΞΌ(D)).

Theorem 14.6

Let D be a two-ortho-basis dictionary D=[ΨΦ]. Let x=Da, where x is known. If a K-sparse representation a exists with kpβ‰₯kq such that (kp,kq) obey

(14.11)ΒΆ2ΞΌ2kpkq+ΞΌkpβˆ’1<0;

then a is the unique solution of both problems (14.4) and (14.7).

A weaker condition is: If

(14.12)ΒΆβ€–aβ€–0=K=kp+kq<2βˆ’0.5ΞΌ;

then a is a unique (K-sparse) solution to both (14.4) and (14.7).

Proof. We first show that a is a unique solution of (14.4).

  1. Towards the end of this proof, we show that (14.11) ⟹ (14.12).

  2. Due to (14.12),

    β€–aβ€–0=K=kp+kq<2βˆ’0.5ΞΌ=0.414ΞΌ<1ΞΌ.
  3. Thus, if a satisfies (14.11), then it is necessarily the sparsest possible representation of x in D due to Theorem 12.8.

  4. All other representations are denser (i.e. have more non-zero entries).

  5. Hence a is a unique solution of (14.4).

We next show that a is also a unique solution of (14.7).

  1. a is a feasible vector to (14.7) since x=Da though it need not be an optimal solution.

  2. We have to find criteria under which a is optimal and no other feasible vector b is optimal.

  3. Since a is the unique solution to (14.4), hence β€–bβ€–0>β€–aβ€–0 for every other feasible b for (14.7).

  4. We consider the set of alternative feasible vectors to (14.7) given by

    C={b|bβ‰ a,β€–bβ€–1≀‖aβ€–1,β€–bβ€–0>β€–aβ€–0 and D(aβˆ’b)=0}.

    This set contains all feasible vectors to (14.7) which are

    1. different from a

    2. have larger support (larger β„“0-β€œnorm”)

    3. satisfy the linear system of equations x=Da

    4. have β„“1 norm less than or equal to a.

  5. If this set is nonempty, then there exists a solution to basis pursuit which is not same as a.

  6. If this set is empty, then the solutions of (14.4) and (14.7) coincide and are both unique.

  7. Writing e=bβˆ’a⟺b=e+a, we have

    β€–bβ€–1≀‖aβ€–1βŸΊβ€–e+aβ€–1βˆ’β€–aβ€–1≀0.
  8. Thus, we can rewrite C as

    Cs={e|eβ‰ 0,β€–e+aβ€–1βˆ’β€–aβ€–1≀0 and De=0}.
  9. In order to show that C is empty, we will show that a larger set containing Cs is also empty.

  10. Essentially, we wish to consider a larger set whose emptiness can be checked easily against (14.11).

  11. If that larger set is empty due to (14.11), then C would also be empty and we would have completed the proof.

Emptiness of Cs

  1. We start by the requirement β€–e+aβ€–1βˆ’β€–aβ€–1≀0.

  2. Let a=[apaq] and e=[epeq] , where p and q refer to parts corresponding to the orthonormal bases Ξ¨ and Ξ¦ respectively (as described at the beginning of this section).

  3. Note that even if ap and aq are sparse, ep and eq need not be.

  4. In fact, support of ep and eq could be very different from Sp and Sq.

  5. We can now write

    0β‰₯β€–e+aβ€–1βˆ’β€–aβ€–1=(βˆ‘i=1N|eip+aip|βˆ’|aip|)+(βˆ‘i=1N|eiq+aiq|βˆ’|aiq|)=βˆ‘iβˆ‰Sp|eip|+βˆ‘iβˆ‰Sq|eiq|+(βˆ‘i∈Sp|eip+aip|βˆ’|aip|)+(βˆ‘i∈Sq|eiq+aiq|βˆ’|aiq|).

    We are splitting the sum as follows.

    1. We first split the sum into ep,ap and eq,aq parts.

    2. We split the sum on the p part to sum over indices in Sp and indices not in Sp.

    3. We split the sum on the q part to sum over indices in Sq and indices not in Sq.

    4. For iβˆ‰Sp, aip=0 leading to |eip+aip|βˆ’|aip|=|eip|.

    5. Ditto for iβˆ‰Sq.

  6. We recall from triangle inequality on complex numbers that

    |x+y|β‰₯|y|βˆ’|x|βˆ€x,y∈C

    which implies |x+y|βˆ’|y|β‰₯βˆ’|x|.

  7. Thus,

    |eip+aip|βˆ’|aip|β‰₯βˆ’|eip|βˆ€i∈Sp

    and

    |eiq+aiq|βˆ’|aiq|β‰₯βˆ’|eiq|βˆ€i∈Sq.
  8. With this, the above condition can be relaxed as

    0β‰₯β€–e+aβ€–1βˆ’β€–aβ€–1β‰₯βˆ‘iβˆ‰Sp|eip|+βˆ‘iβˆ‰Sq|eiq|βˆ’βˆ‘i∈Sp|eip|βˆ’βˆ‘i∈Sq|eiq|.
  9. Every e satisfying this inequality will also satisfy the condition β€–e+aβ€–1βˆ’β€–aβ€–1≀0.

  10. To simplify notation we can write

    βˆ‘i∈Sp|eip|=1pT|ep| and βˆ‘i∈Sq|eiq|=1qT|eq|.
  11. Then we have

    β€–epβ€–1=βˆ‘i∈Sp|eip|+βˆ‘iβˆ‰Sp|eip|βŸΊβˆ‘iβˆ‰Sp|eip|=β€–epβ€–1βˆ’βˆ‘i∈Sp|eip|=β€–epβ€–1βˆ’1pT|ep|.
  12. Similarly,

    βˆ‘iβˆ‰Sq|eiq|=β€–eqβ€–1βˆ’1qT|eq|.
  13. Thus,

    βˆ‘iβˆ‰Sp|eip|+βˆ‘iβˆ‰Sq|eiq|βˆ’βˆ‘i∈Sp|eip|βˆ’βˆ‘i∈Sq|eiq|=β€–epβ€–1βˆ’21pT|ep|+β€–eqβ€–1βˆ’21qT|eq|.
  14. We can now define the set

    Cs1={e|eβ‰ 0,β€–epβ€–1+β€–eqβ€–1βˆ’21pT|ep|βˆ’21qT|eq|≀0 and De=0}.
  15. Clearly, CsβŠ†Cs1 and if Cs1 is empty, then Cs will also be empty.

  16. Note that this formulation of Cs1 is dependent only on the support of a and not on values in a.

  17. We now turn back to the requirement De=0 and relax it further.

  18. We note that,

    De=[ΨΦ][epeq]=Ψep+Φeq=0.
  19. Multiplying by Ξ¨H we get

    ep+Ξ¨HΞ¦eq=0⟺ep=βˆ’Ξ¨HΞ¦eq

    since ΨHΨ=I (unitary matrix).

  20. Similarly multiplying with Ξ¦H, we obtain

    Ξ¦HΞ¨ep+eq=0⟺eq=βˆ’Ξ¦HΞ¨ep.
  21. Note that entries in ΨHΦ and ΦHΨ are inner products between columns of D, hence their magnitudes are upper bounded by μ (coherence).

  22. Denote B=ΨHΦ and consider the product v=ΨHΦeq=Beq.

  23. Then

    vi=βˆ‘j=1NBijejq.
  24. Thus,

    |vi|=|βˆ‘j=1NBijejq|β‰€βˆ‘j=1N|Bijejq|β‰€ΞΌβˆ‘j=1N|ejq|=ΞΌ1T|eq|.
  25. Applying this result on ep we get,

    |ep|=|Ξ¨HΞ¦eq|βͺ―ΞΌ1|eq|.
  26. Similarly,

    |eq|=|Ξ¦HΞ¨ep|βͺ―ΞΌ1|ep|.
  27. Note that since 1=1β‹…1T, it is a rank-1 matrix.

  28. We now construct a set Cs2 as

    Cs2={e|eβ‰ 0β€–epβ€–1+β€–eqβ€–1βˆ’21pT|ep|βˆ’21qT|eq|≀0|ep|βͺ―ΞΌ1|eq| and |eq|βͺ―ΞΌ1|ep|}.
  29. Clearly, Cs1βŠ†Cs2 since for every e∈Cs1, De=0⟹e∈Cs2.

  30. We now define fp=|ep| and fq=|eq| as the absolute value vectors.

  31. Correspondingly, let us define

    f=|e|=[fpfq].
  32. Clearly, β€–epβ€–1=1Tfp and β€–eqβ€–1=1Tfq.

  33. Further fpβͺ°0; i.e., every entry in fp is nonnegative.

  34. Similarly, fqβͺ°0.

  35. We can then introduce a set Cf as

    Cf={f|fβ‰ 01Tfp+1Tfqβˆ’21pTfpβˆ’21qTfq≀0fpβͺ―ΞΌ1fqfqβͺ―ΞΌ1fp and fpβͺ°0,fqβͺ°0}.
  36. It is easy to see that if e∈Cs2 then f=|e|∈Cf.

  37. Thus, if Cf is empty, then Cs2 should be empty too.

  38. We note that if f∈Cf, then for all c>0, cf∈Cf.

  39. Thus, in order to study (the emptiness of) Cf, it is sufficient to study unit β„“1-norm vectors f∈Cf; i.e., there is no unit β„“1-norm vector in Cf, if and only if Cf is empty.

  40. Now

    β€–fβ€–1=1Tf=1Tfp+1Tfq

    since fβͺ°0.

  41. This leads to:

    β€–fβ€–1=1⟺1Tfp+1Tfq=1.
  42. We construct the new set of unit β„“1-norm vectors

    Cr={f|fβ‰ 01βˆ’21pTfpβˆ’21qTfq≀0fpβͺ―ΞΌ1fqfqβͺ―ΞΌ1fp1Tfp+1Tfq=1 and fpβͺ°0,fqβͺ°0}.
  43. We have Cr=βˆ…βŸΊCf=βˆ….

  44. Note that the constraint 1βˆ’21pTfpβˆ’21qTfq≀0 can be rewritten as

    1pTfp+1qTfqβ‰₯12.
  45. The set Cr is much easier to analyze since

    • If has no explicit dependency on D. D is represented only by a single parameter, its coherence ΞΌ.

    • All constraints are simple linear constraints. Thus finding the elements of Cf can be formulated as a linear programming (feasibility) problem.

    • The order of non-zero entries inside fp and fq doesn’t have any influence on the requirements for f to belong to Cr. Thus, without loss of generality, we can focus on vectors for which the first kp entries are non-zero in fp and first kq entries are non-zero in fq respectively.

  46. We next verify that Cr must be empty under (14.11).

Emptiness of Cr

  1. In order to find vectors in Cr, we can solve the following linear program.

    (14.13)ΒΆmaximize fp,fq1pTfp+1qTfqsubject to fpβͺ―ΞΌ1fqfqβͺ―ΞΌ1fp1T(fp+fq)=1fpβͺ°0,fqβͺ°0.
  2. f=0 is a feasible vector for this linear program, hence a solution does exist for this program.

  3. What is interesting is the value of the objective function for the optimal solution.

  4. Let fpβˆ—,fqβˆ— be (an) optimal solution for this linear program.

  5. If 1pTfpβˆ—+1qTfqβˆ—β‰₯12, then fβˆ— satisfies all the requirements of Cr and Cr is indeed not empty.

  6. This doesn’t guarantee that C will also be non-empty though.

  7. On the contrary, if 1pTfpβˆ—+1qTfqβˆ—<12, then Cr is indeed empty (as one of the requirements cannot be met).

  8. Hence Cf is also empty leading to CβŠ‚Cf being empty too.

  9. Thus, a condition which leads to

    1pTfpβˆ—+1qTfqβˆ—<12

    is a sufficient condition for equivalence of (14.4) and (14.7).

  10. Consider a feasible f for (14.13).

  11. Let β€–fpβ€–1=1Tfp=c.

  12. Since 1T(fp+fq)=1, hence β€–fqβ€–1=1Tfq=1βˆ’c.

  13. We note that

    1fp=1β‹…1Tfp=β€–fpβ€–11=c1.
  14. Similarly,

    1fq=(1βˆ’c)1.
  15. Thus, the first two constraints change into

    fpβͺ―(1βˆ’c)ΞΌ1fqβͺ―cΞΌ1.
  16. Since the objective is to maximize 1pTfp+1qTfq, it is natural to maximize non-zero entries in fp and fq corresponding to Sp and Sq.

  17. A straight-forward option is to choose first kp entries in fp to be (1βˆ’c)ΞΌ and first kq entries in fq to be cΞΌ.

  18. Other entries can be chosen arbitrarily to meet the requirement that 1T(fp+fq)=1.

  19. With this choice, we have

    1pTfp+1qTfq=kp(1βˆ’c)ΞΌ+kqcΞΌ=ΞΌ(kpβˆ’c(kpβˆ’kq)).
  20. We recall that we have chosen kpβ‰₯kq.

  21. Thus, the expression is maximized if c is chosen to be as small as possible.

  22. The choice of c must meet following conditions on β„“1-norms. (Basically the sum of first kp terms of fp must not be more than the β„“1 norm of fp. Same for fq).

    β€–fpβ€–1=1Tfp=cβ‰₯kp(1βˆ’c)ΞΌβ€–fqβ€–1=1Tfq=1βˆ’cβ‰₯kqcΞΌ.
  23. Simplifying these inequalities we get

    cβ‰₯kp(1βˆ’c)μ⟹cβ‰₯kpΞΌ1+kpΞΌ1βˆ’cβ‰₯kqcμ⟹c≀11+kqΞΌ.
  24. Since these two conditions must be satisfied, hence we require kp,kq to meet

    kpΞΌ1+kpμ≀11+kqμ⟹kpkq≀1ΞΌ2.
  25. We will verify later that this condition is met if (14.11) holds.

  26. Assuming the condition is met, obviously the smallest possible value of c is given by kpΞΌ1+kpΞΌ.

  27. The maximum value of objective function then becomes

    1pTfp+1qTfq=ΞΌ(kpβˆ’c(kpβˆ’kq))=ΞΌ(kpβˆ’kpΞΌ1+kpΞΌ(kpβˆ’kq))=kpΞΌ+kpkqΞΌ21+kpΞΌ.
  28. Finally, for BP to succeed, we require this expression to be strictly less than half.

  29. This gives us

    kpΞΌ+kpkqΞΌ21+kpΞΌ<12⟹2kpkqΞΌ2+kpΞΌβˆ’1<0

    which is the sufficient condition for BP to succeed in the theorem.

We now show that (14.11) ⟹ the weaker condition (14.12).

  1. From (14.11) we can write kq as

    2kpkqΞΌ2+kpΞΌβˆ’1<0⟹2kpkqΞΌ2<1βˆ’kpμ⟹kq<1βˆ’kpΞΌ2kpΞΌ2.
  2. Thus,

    β€–aβ€–0=kp+kq<kp+1βˆ’kpΞΌ2kpΞΌ2=2ΞΌ2kp2+1βˆ’ΞΌkp2ΞΌ2kp=1ΞΌβ‹…2ΞΌ2kp2+1βˆ’ΞΌkp2ΞΌkp.
  3. We define u=ΞΌkp and rewrite above as

    β€–aβ€–0<1ΞΌ2u2βˆ’u+12u.
  4. The weaker condition can now be obtained by minimizing the upper bound on R.H.S. of this equation.

  5. We define

    f(u)=2u2βˆ’u+12u.
  6. Differentiating and equating with 0, we get

    fβ€²(u)=2u2βˆ’12u2=0.
  7. The optimal value is obtained when u=Β±0.5.

  8. Since both ΞΌ and kp are positive quantities, hence the negative value for u is rejected and we get u=0.5.

  9. This gives us

    β€–aβ€–0<1ΞΌ2βˆ’0.520.5=2βˆ’0.5ΞΌ.
  10. Lastly, the property that arithmetic mean is greater than or equal to geometric mean gives us

    kpkq≀(kp+kq)24<(2βˆ’0.5)24ΞΌ2<1ΞΌ2.

14.2.3. General CaseΒΆ

We now consider the case where D∈CNΓ—D is an arbitrary (redundant) dictionary. We will require that D is full row rank. If D is not a full row rank matrix then some of its columns (atoms) can be removed to make it so.

We develop sufficient conditions under which solutions of (14.4) and (14.7) match for the general case [18, 19].

Theorem 14.7

Let D be an arbitrary full rank redundant dictionary. Let x=Da, where x is known. If a sparse representation a exists obeying

(14.14)ΒΆβ€–aβ€–0<12(1+1ΞΌ),

then a is the unique solution of both (14.4) and (14.7).

Proof. Due to Theorem 12.26, a is a unique solution for (14.4) since a satisfies (14.14). We need to show that it is also a unique solution to (14.7)

  1. For any other feasible b for (14.7), we have β€–bβ€–0>β€–aβ€–0 since it is unique sparsest solution of x=Da.

  2. We start with defining a set of alternative feasible vectors to (14.7):

    C={b|bβ‰ aβ€–bβ€–1≀‖aβ€–1β€–bβ€–0>β€–aβ€–0 and D(bβˆ’a)=0}.

    This set contains all possible representations that

    1. are different from a

    2. have larger support

    3. satisfy Db=x

    4. have a better (or at least as good) β„“1-norm.

  3. We need to show that if (14.14) holds, then the set C will be empty.

  4. Otherwise, BP would choose a solution different than a.

  5. The condition β€–bβ€–0>β€–aβ€–0 is redundant under the assumption (14.14).

  6. Following the proof of Theorem 14.6, we define

    e=bβˆ’a.
  7. We can then rewrite C as

    Cs={e|eβ‰ 0,β€–e+aβ€–1βˆ’β€–aβ€–1≀0, and De=0}.
  8. Again, we will enlarge the set Cs and show that even the larger set is empty when (14.14) holds.

  9. We start with the requirement β€–e+aβ€–1βˆ’β€–aβ€–1≀0.

  10. A simple permutation of columns of D can bring the nonzero entries in a to the beginning.

  11. Thus, without loss of generality, we assume that first K entries in a are nonzero and the rest are zero.

  12. We can now rewrite the requirement as

    β€–e+aβ€–1βˆ’β€–aβ€–1=βˆ‘j=1K(|ej+aj|βˆ’|aj|)+βˆ‘j>K|ej|≀0.
  13. Using the inequality |x+y|βˆ’|y|β‰₯βˆ’|x|, we can relax above condition as

    βˆ’βˆ‘j=1K|ej|+βˆ‘j>K|ej|≀0.
  14. Let 1K denote a vector with K ones at the beginning and rest zeros.

  15. Then,

    βˆ‘j=1K|ej|=1KT|e|.
  16. Further,

    βˆ‘j>K|ej|=β€–eβ€–1βˆ’βˆ‘j=1K|ej|=1T|e|βˆ’1KT|e|.
  17. Thus, we can rewrite above inequality as

    1T|e|βˆ’21KT|e|≀0.
  18. We can now define

    Cs1={e|eβ‰ 0,1T|e|βˆ’21KT|e|≀0, and De=0}.
  19. Clearly CsβŠ†Cs1.

  20. We will now relax the requirement of De=0.

  21. Multiplying by DH, we get

    DHDe=0.
  22. If e∈Cs1, it will also satisfy this equation.

  23. Moreover, if e satisfies this, then e belongs to the null space of DHD.

  24. Since D is full rank, hence e has to be in the null space of D also.

  25. Thus the two conditions De=0 and DHDe=0 are equivalent.

  26. We note that off-diagonal entries in DHD are bounded by ΞΌ while the main diagonal consists of all ones.

  27. So, we can write

    DHDe=0⟺(DHDβˆ’I+I)e=0βŸΊβˆ’e=(DHDβˆ’I)e.
  28. Suppose v=Gu. Then vi=βˆ‘jGijuj.

  29. Thus

    |vi|=|βˆ‘jGijuj|β‰€βˆ‘j|Gijuj|=βˆ‘j|Gij||uj|.
  30. This gives us |v|βͺ―|G||v| where βͺ― indicates component wise inequality.

  31. Taking an entry-wise absolute value on both sides, we get

    |e|=|(DHDβˆ’I)e|βͺ―|DHDβˆ’I||e|βͺ―ΞΌ(1βˆ’I)|e|.
  32. The last part is due to the fact that all entries in the vector |e| and the matrix |DHDβˆ’I| are non-negative and the entries in |DHDβˆ’I| are dominated by ΞΌ.

  33. Further,

    |e|βͺ―ΞΌ(1βˆ’I)|e|⟺(1+ΞΌ)|e|βͺ―ΞΌ1|e|=ΞΌβ€–eβ€–11⟺|e|βͺ―ΞΌβ€–eβ€–11+ΞΌ1.
  34. In the above we used the fact that 1|e|=11T|e|=1β€–eβ€–1.

  35. We can now define a new set

    Cs2={e|eβ‰ 0,1T|e|βˆ’21KT|e|≀0 and |e|βͺ―ΞΌβ€–eβ€–11+ΞΌ1}.
  36. Clearly, Cs1βŠ†Cs2.

  37. We note that Cs2 is unbounded since if e∈Cs2, then ce∈Cs2βˆ€cβ‰ 0.

  38. Thus, in order to study its behavior, it is sufficient to consider the set of vectors with unit norm vectors β€–eβ€–1=1.

  39. We construct the new set as

    Cr={e|β€–eβ€–1=1,1βˆ’21KT|e|≀0 and |e|βͺ―ΞΌ1+ΞΌ1}.
  40. Note that we replaced 1T|e|=‖e‖1=1 in formulating the description of Cr and the condition e≠0 is automatically enforced since ‖e‖1=1.

  41. Clearly Cs2=βˆ…βŸΊCr=βˆ….

  42. In order to satisfy the requirement 1βˆ’21KT|e|≀0, we need to have 1KT|e| as large as possible.

  43. Since this quantity only considers first K entries in e, hence the energy in e should be concentrated inside the first K entries to maximize this quantity.

  44. However, entries in e are restricted by the third requirement in Cr.

  45. We can maximize it by choosing

    |ej|=ΞΌ1+ΞΌ

    for first K entries in e.

  46. We then get

    1βˆ’21KT|e|=1βˆ’2KΞΌ1+μ≀0.
  47. This gives us

    1βˆ’2KΞΌ1+μ≀0⟺1+μ≀2Kμ⟺2Kβ‰₯1+μμ⟺Kβ‰₯12(1+1ΞΌ).
  48. This is a necessary condition for Cr to be non-empty.

  49. Thus, if

    K<12(1+1ΞΌ)

    then, the requirement 1βˆ’21KT|e|≀0 is not satisfied and Cr is empty.

  50. Consequently, C is empty and the theorem is proved.

We present another result which is based on ΞΌ1/2(G) Definition 12.31 measure of the Gram matrix of the dictionary. This result is due to [18].

Theorem 14.8

Let x=Da and β€–aβ€–0<ΞΌ1/2(G), then then a is the unique solution of both (14.4) and (14.7).

Proof. Unique solution of (14.4).

  1. Let Ξ›=supp(a) and K=|Ξ›|.

  2. We have K=β€–aβ€–0<ΞΌ1/2(G).

  3. By Theorem 12.72

    spark⁑(D)β‰₯2ΞΌ1/2(G)+1⟹spark⁑(D)>2K+1⟹K<12spark⁑(D).
  4. By Theorem 12.22, a is the unique sparsest solution.

Unique solution of (14.7)

  1. We need to show that for any aβ€² satisfying x=Daβ€², we must have β€–aβ€²β€–1>β€–aβ€–1.

  2. We will show that any vector in the null space of D exhibits less than 50% concentration on Ξ›; i.e., for every h∈N(D)

    βˆ‘kβˆˆΞ›|hk|<12β€–hβ€–1.
  3. Now

    Dh=0⟹Gh=DHDh=0.
  4. Subtracting both sides with h we get

    Ghβˆ’h=(Gβˆ’I)h=βˆ’h.
  5. Let F denote an KΓ—D matrix formed from the rows of Gβˆ’I corresponding to the indices in Ξ›.

  6. Then

    (Gβˆ’I)h=βˆ’hβŸΉβ€–Fhβ€–1=βˆ‘kβˆˆΞ›|hk|.

    hk for some kβˆˆΞ› is the negative of the inner product of some row in F with h.

  7. We know that

    β€–Fhβ€–1≀‖Fβ€–1β€–hβ€–1

    where β€–Fβ€–1 is the max-column-sum norm of F.

  8. This gives us

    β€–Fβ€–1β€–hβ€–1β‰₯βˆ‘kβˆˆΞ›|hk|.
  9. In any column of F the number of entries is K.

  10. One of them is 0 (corresponding to the diagonal entry in G).

  11. Thus, leaving it the rest of the entries are Kβˆ’1.

  12. By assumption ΞΌ1/2(G)>K.

  13. Thus any set of entries in a column which is less than or equal to K entries cannot have a sum exceeding 12.

  14. This gives an upper bound on the max-column-sum of F; i.e.,

    β€–Fβ€–1<12.
  15. Thus, we get

    βˆ‘kβˆˆΞ›|hk|≀‖Fβ€–1β€–hβ€–1<12β€–hβ€–1

    for every h∈N(D).

  16. The rest follows from the fact that for any other aβ€² such that x=Daβ€²=Da, we know that

    β€–aβ€²β€–1>β€–aβ€–1

    whenever

    βˆ‘kβˆˆΞ›|hk|<12β€–hβ€–1

    where h=aβˆ’aβ€² (thus Dh=0).

14.2.4. BPICΒΆ

In the subsection, we present a stability guarantee result for BPIC.

Theorem 14.9

Consider an instance of the (14.8) problem defined by the triplet (D,x,ϡ). Suppose that a vector a∈CD is a feasible solution to (14.8) satisfying the sparsity constraint

β€–aβ€–0<14(1+1ΞΌ(D)).

The solution a^ of (14.8) must satisfy

β€–a^βˆ’aβ€–22≀4Ο΅21βˆ’ΞΌ(D)(4β€–aβ€–0βˆ’1).

Proof. As before, we define b=a^βˆ’a.

  1. Then

    β€–Dbβ€–2=β€–D(a^βˆ’a)β€–2=β€–Da^βˆ’x+xβˆ’Daβ€–2≀2Ο΅.
  2. We now rewrite the inequality in terms of the Gram matrix G=DHD.

    4Ο΅2β‰₯=β€–Dbβ€–22=bHGb=bH(Gβˆ’I+I)b=β€–bβ€–22+bH(Gβˆ’I)b.
  3. It is easy to show that:

    βˆ’|b|T|A||b|≀bHAb≀|b|T|A||b|

    whenever A is Hermitian.

    1. To see this just notice that bHAb is a real quantity.

    2. Hence bHAb=Β±|bHAb|.

    3. Now, using triangle inequality we can easily show that |bHAb|≀|b|T|A||b|.

  4. Since Gβˆ’I is Hermitian, hence

    bH(Gβˆ’I)bβ‰₯βˆ’|b|T|Gβˆ’I||b|.
  5. Now

    |b|T|Gβˆ’I||b|=βˆ‘i,j|bi||diHdjβˆ’Ξ΄ij||bj|≀μ(D)βˆ‘i,j,iβ‰ j|bi||bj|=ΞΌ(D)|b|T(1βˆ’I)|b|.
  6. Only the off-diagonal terms of G remain in the sum, which are all dominated by ΞΌ(D).

  7. Thus we get

    4Ο΅2β‰₯β€–bβ€–22βˆ’|b|T(1βˆ’I)|b|=(1+ΞΌ(D))β€–bβ€–22βˆ’ΞΌ(D)|b|T1|b|=(1+ΞΌ(D))β€–bβ€–22βˆ’ΞΌ(D)β€–bβ€–12.

    This is valid since vH1v=β€–vβ€–12.

  8. Since a^ is optimal solution of (14.8), hence

    β€–a^β€–1=β€–b+aβ€–1≀‖aβ€–1βŸΉβ€–b+aβ€–1βˆ’β€–aβ€–1≀0.
  9. Let Ξ›=supp(a) and K=|Ξ›|.

  10. By a simple permutation of columns of D, we can bring the entries in a to the first K entries making Ξ›={1,…,K}.

  11. We will make this assumption going forward without loss of generality.

  12. Let 1K be corresponding support vector (of ones in first K places and 0 in rest).

  13. From our previous analysis, we recall that

    β€–b+aβ€–1βˆ’β€–aβ€–1β‰₯β€–bβ€–1βˆ’21KT|b|.
  14. Thus

    β€–bβ€–1βˆ’21KT|b|≀0βŸΉβ€–bβ€–1≀21KT|b|.
  15. 1KT|b| is the sum of first K terms of |b|.

  16. Considering bΞ› as a vector ∈CK and using the β„“1-β„“2 norm relation β€–vβ€–1≀Kβ€–vβ€–2βˆ€v∈CN, we get

    1KT|b|=β€–bΞ›β€–1≀Kβ€–bΞ›β€–2≀Kβ€–bβ€–2.
  17. Thus,

    β€–bβ€–1≀21KT|b|≀2Kβ€–bβ€–2.
  18. Putting this back in the previous inequality

    4Ο΅2β‰₯(1+ΞΌ(D))β€–bβ€–22βˆ’ΞΌ(D)β€–bβ€–12β‰₯(1+ΞΌ(D))β€–bβ€–22βˆ’ΞΌ(D)4Kβ€–bβ€–22=(1βˆ’(4Kβˆ’1)ΞΌ(D))β€–bβ€–22.
  19. We note that this inequality is valid only if

    1βˆ’(4Kβˆ’1)ΞΌ(D)>0.
  20. This condition can be reformulated as

    β€–aβ€–0=K<14(1+1ΞΌ(D)).
  21. Rewriting the bound on β€–bβ€–22 we get

    β€–bβ€–22≀4Ο΅2(1βˆ’(4Kβˆ’1)ΞΌ(D))

    which is the desired result.

14.2.5. BPDNΒΆ

In this subsection we will examine the β„“1 penalty problem (14.10) more closely.

(14.15)ΒΆa^=arg mina∈CD12β€–xβˆ’Daβ€–22+Ξ³β€–aβ€–1.

We will focus on following issues:

  • Some results from convex analysis useful for our study

  • Conditions for the minimization of (14.15) over coefficients a supported on a subdictionary DΞ›

  • Conditions under which the unique minimizer for a subdictionary is also the global minimizer for (14.15)

  • Application of (14.15) for sparse signal recovery

  • Application of (14.15) for identification of sparse signals in presence of noise

  • Application of (14.15) for identification of sparse signals in presence of Gaussian noise

We recall some definitions and results from convex analysis which will help us understand the minimizers for (14.15) problem.

Convex analysis for real valued functions the vector space (Cn,R) is developed using the bilinear inner product defined as

⟨x,y⟩B=Re(yHx).

The subscript B is there to distinguish it from the standard inner product for the complex coordinate space ⟨x,y⟩=yHx. The two inner products are related as

⟨x,y⟩B=Re(⟨x,y⟩).

We consider real valued functions over the inner product space X=(CD,βŸ¨β‹…,β‹…βŸ©B). Note that the dimension of X is 2D.

A real valued convex function f:X→R satisfies the standard convexity inequality

f(ΞΈx+(1βˆ’ΞΈ)y)≀θf(x)+(1βˆ’ΞΈ)f(y)βˆ€0≀θ≀1.

The objective function for the problem (14.15) is

(14.16)ΒΆL(a)=12β€–xβˆ’Daβ€–22+Ξ³β€–aβ€–1.

Clearly, L is a real valued function over X and it is easy to so that it is a convex function. Moreover L(a)β‰₯0 always.

We suggest the readers to review the material in Subgradients. For any function f:X→R, its subdifferential set is defined as

βˆ‚f(x)β‰œ{g∈X|f(y)β‰₯f(x)+⟨yβˆ’x,g⟩Bβˆ€y∈X}.

The elements of subdifferential set are called subgradients. If f possesses a gradient at x, then it is the unique subgradient at x. i.e.

βˆ‚f(x)={βˆ‡f(x)}

where βˆ‡f(x) is the gradient of f at x.

For convex function, the subdifferential of a sum is the (Minkowski) sum of the subdifferentials (Theorem 8.222); i.e.,

βˆ‚(f(x)+g(x))=βˆ‚f(x)+βˆ‚g(x)={h1+h2|∈h1βˆˆβˆ‚f(x),h2βˆˆβˆ‚g(x)}.

By Fermat’s optimality condition (Theorem 8.233), if f is a closed, proper convex function, then x is a global minimizer of f if and only if 0βˆˆβˆ‚f(x).

We would be specifically interested in the subdifferential for the function β€–aβ€–1.

Theorem 14.10

Let z∈X. The vector g∈X lies in the subdifferential βˆ‚β€–zβ€–1 if and only if

  • |gk|≀1 whenever zk=0.

  • gk=sgn(zk) whenever zkβ‰ 0.

The proof is skipped.

We recall that the signum function for complex numbers is defined as

sgn(reiΞΈ)={eiΞΈif r>0;0if r=0.

The subdifferential for β„“1 norm function for Rn is developed in Example 8.67. We note that β„“1 norm is differentiable at nonzero vectors. Also, we can see that:

  1. β€–gβ€–βˆž=1 when zβ‰ 0.

  2. β€–gβ€–βˆžβ‰€1 when z=0.

14.2.5.1. Restricted MinimizersΒΆ

  1. Suppose Ξ› index a subdictionary DΞ›.

  2. DΞ› is a linearly independent collection of atoms.

  3. Hence a unique β„“2 best approximation x^Ξ› of x using the atoms in DΞ› can be obtained using the least square techniques.

  4. We define the orthogonal projection operator

    PΛ=DΛDˆ.
  5. And we get

    x^Ξ›=PΞ›x.
  6. The approximation is orthogonal to the residual; i.e., (xβˆ’x^Ξ›)βŠ₯x^Ξ›.

  7. There is a unique coefficient vector cΞ› supported on Ξ› that synthesizes the approximation x^Ξ›.

    cΛ=Dˆx=Dˆx^Λ.
  8. We also have

    x^Ξ›=PΞ›x=DΞ›cΞ›.

Theorem 14.11 (Minimization of L over vectors supported on Ξ›)

Let Ξ› index a linearly independent collection of atoms in D and let aβˆ— minimize the objective function L(a) in (14.16) over all coefficient vectors supported on Ξ› (i.e. supp(a)βŠ†Ξ›). A necessary and sufficient condition on such a minimizer is that

(14.17)ΒΆcΞ›βˆ’aβˆ—=Ξ³(DΞ›HDΞ›)βˆ’1g

where cΞ›=DΛ†x and the vector g is drawn from βˆ‚β€–aβˆ—β€–1. Moreover, the minimizer aβˆ— is unique.

Proof. Since we are restricted a to be supported on Ξ› (i.e. a∈CΞ›), hence

Da=DΞ›aΞ›.

The objective function simplifies to

L(a)=12β€–xβˆ’DΞ›aΞ›β€–22+Ξ³β€–aΞ›β€–1.
  1. We define x^Ξ›=PΞ›x.

  2. Now, both x^Ξ› and DΞ›aΞ› belong to the column space of DΞ› while xβˆ’x^Ξ› is orthogonal to it.

  3. Hence

    xβˆ’x^Ξ›βŠ₯x^Ξ›βˆ’Da.
  4. Thus, using the Pythagorean theorem, we get

    β€–xβˆ’Daβ€–22=β€–xβˆ’x^Ξ›+x^Ξ›βˆ’Daβ€–22=β€–xβˆ’x^Ξ›β€–22+β€–x^Ξ›βˆ’Daβ€–22.
  5. We can rewrite L(a) as

    L(a)=12β€–xβˆ’x^Ξ›β€–22+12β€–x^Ξ›βˆ’Daβ€–22+Ξ³β€–aβ€–1.
  6. Define

    F(a)=12β€–x^Ξ›βˆ’Daβ€–22+Ξ³β€–aβ€–1.
  7. Then

    L(a)=12β€–xβˆ’x^Ξ›β€–22+F(a).
  8. Note that the term β€–xβˆ’x^Ξ›β€–22 is constant. It is the squared norm of the least square error.

  9. Thus, minimizing L(a) over the coefficient vectors supported on Ξ› is equivalent to minimizing F(a) over the same support set.

  10. Note that

    Da=DΞ›aΞ› and β€–aβ€–1=β€–aΞ›β€–1.
  11. We can write F(a) as

    F(a)=12β€–x^Ξ›βˆ’DΞ›aΞ›β€–22+Ξ³β€–aΞ›β€–1.
  12. Note that F(a) depends only on entries in a which are part of the support Ξ›.

  13. We can replace the variable aΞ› with a∈CΞ› and rewrite F(a) as

    F(a)=12β€–x^Ξ›βˆ’DΞ›aβ€–22+Ξ³β€–aβ€–1βˆ€a∈CΞ›.
  14. Since atoms indexed by Ξ› are linearly independent, hence DΞ› has full column rank.

  15. Thus, the quadratic term β€–x^Ξ›βˆ’DΞ›aβ€–22 is strictly convex.

  16. Since β€–aβ€–1 is also convex, F(a) therefore is strictly convex and its minimizer is unique.

  17. Since F is strictly convex and unconstrained, hence 0βˆˆβˆ‚F(aβˆ—) is a necessary and sufficient condition for the coefficient vector aβˆ— to minimize F(a).

  18. The gradient of the first (quadratic) term is

    (DΞ›HDΞ›)aβˆ’DΞ›Hx^Ξ›.
  19. For the second term we have to consider its subdifferential βˆ‚β€–aβ€–1.

  20. Thus, at aβˆ— it follows that

    (DΞ›HDΞ›)aβˆ—βˆ’DΞ›Hx^Ξ›+Ξ³g=0.

    where g is some subgradient in βˆ‚β€–aβˆ—β€–1.

  21. Premultiplying with (DΞ›HDΞ›)βˆ’1 we get

    aβˆ—βˆ’DΛ†x^Ξ›+Ξ³(DΞ›HDΞ›)βˆ’1g=0⟹DΛ†x^Ξ›βˆ’aβˆ—=Ξ³(DΞ›HDΞ›)βˆ’1g.
  22. Finally, we recall that Dˆx^Λ=cΛ.

  23. Thus, we get the desired result

    cΞ›βˆ’aβˆ—=Ξ³(DΞ›HDΞ›)βˆ’1g.

Some bounds follow as a result of this theorem. Readers are suggested to review the material in Matrix Norms.

Theorem 14.12

Suppose that Ξ› index a subdictionary DΞ› and let aβˆ— minimize the function L over all coefficient vectors supported on Ξ›. Then following bounds are in force:

β€–cΞ›βˆ’aβˆ—β€–βˆžβ‰€Ξ³β€–(DΞ›HDΞ›)βˆ’1β€–βˆž.
β€–DΞ›(cΞ›βˆ’aβˆ—)β€–2≀γ‖DΛ†‖2β†’1.

Proof. We start with

(14.18)ΒΆcΞ›βˆ’aβˆ—=Ξ³(DΞ›HDΞ›)βˆ’1g.
  1. we take the β„“βˆž norm on both sides and apply some norm bounds

    β€–cΞ›βˆ’aβˆ—β€–βˆž=β€–Ξ³(DΞ›HDΞ›)βˆ’1gβ€–βˆžβ‰€Ξ³β€–(DΞ›HDΞ›)βˆ’1β€–βˆžβ€–gβ€–βˆžβ‰€Ξ³β€–(DΞ›HDΞ›)βˆ’1β€–βˆž.

    The last inequality is valid since from Theorem 14.10 we have: β€–gβ€–βˆžβ‰€1.

  2. Now let us premultiply (14.18) with DΞ› and apply β„“2 norm

    β€–DΞ›(cΞ›βˆ’aβˆ—)β€–2=β€–Ξ³DΞ›(DΞ›HDΞ›)βˆ’1gβ€–2=Ξ³β€–(DΛ†)Hgβ€–2≀‖(DΛ†)Hβ€–βˆžβ†’2β€–gβ€–βˆž=β€–DΛ†‖2β†’1β€–gβ€–βˆžβ‰€β€–DΛ†‖2β†’1.
  3. In this derivation we used facts like β€–Aβ€–pβ†’q=β€–AHβ€–qβ€²β†’pβ€², β€–Axβ€–q≀‖Aβ€–pβ†’qβ€–xβ€–p and β€–gβ€–βˆžβ‰€1.

14.2.5.2. The Correlation ConditionΒΆ

So far we have established a condition which ensures that aβˆ— is a unique minimizer of L given that a is supported on Ξ›. We now establish a sufficient condition under which aβˆ— is also a global minimizer for L(a).

Theorem 14.13 (Correlation condition for global minimizer)

Assume that Ξ› indexes a subdictionary. Let aβˆ— minimize the function L over all coefficient vectors supported on Ξ›. Suppose that

(14.19)ΒΆβ€–DH(xβˆ’x^Ξ›)β€–βˆžβ‰€Ξ³[1βˆ’maxΟ‰βˆ‰Ξ›|⟨DΛ†dΟ‰,g⟩|]

where gβˆˆβˆ‚β€–aβˆ—β€–1 is determined by (14.17). Then aβˆ— is the unique global minimizer of L. Moreover, the condition

(14.20)ΒΆβ€–DH(xβˆ’x^Ξ›)β€–βˆžβ‰€Ξ³[1βˆ’maxΟ‰βˆ‰Ξ›β€–DΛ†dΟ‰β€–1]

guarantees that aβˆ— is the unique global minimizer of L.

Proof. Let aβˆ— be the unique minimizer of L over coefficient vectors supported on Ξ›. Then, the value of the objective function L(a) increases if we change any coordinate of aβˆ— indexed in Ξ›.

What we need is a condition which ensures that the value of objective function also increases if we change any other component of aβˆ— (not indexed by Ξ›).

  1. If this happens, then aβˆ— will become a local minimizer of L.

  2. Further, since L is convex, aβˆ— will also be global minimizer.

Towards this, let Ο‰ be some index not in Ξ› and eΟ‰βˆˆCD be corresponding unit vector. Let Ξ΄eΟ‰ be a small perturbation introduced in Ο‰-th coordinate. (δ∈C is a small scalar, though need not be positive real). We need find a condition which ensures

L(aβˆ—+Ξ΄eΟ‰)βˆ’L(aβˆ—)>0βˆ€Ο‰βˆ‰Ξ›.
  1. Let us expand the L.H.S. of this inequality:

    L(aβˆ—+Ξ΄eΟ‰)βˆ’L(aβˆ—)=[12β€–xβˆ’Daβˆ—βˆ’Ξ΄dΟ‰β€–22βˆ’12β€–xβˆ’Daβˆ—β€–22]+Ξ³[β€–aβˆ—+Ξ΄eΟ‰β€–1βˆ’β€–aβˆ—β€–1].

    Here we used the fact that Deω=dω.

  2. Note that since aβˆ— is supported on Ξ› and Ο‰βˆ‰Ξ›, hence

    β€–aβˆ—+Ξ΄eΟ‰β€–1=β€–aβˆ—β€–1+β€–Ξ΄eΟ‰β€–1.
  3. Thus

    β€–aβˆ—+Ξ΄eΟ‰β€–1βˆ’β€–aβˆ—β€–1=|Ξ΄|.
  4. We should also simplify the first bracket.

    β€–xβˆ’Daβˆ—β€–22=(xβˆ’Daβˆ—)H(xβˆ’Daβˆ—)=xHx+aβˆ—HDHDaβˆ—βˆ’xHDaβˆ—βˆ’aβˆ—HDHx.
  5. Similarly

    β€–xβˆ’Daβˆ—βˆ’Ξ΄dΟ‰β€–22=(xβˆ’Daβˆ—βˆ’Ξ΄dΟ‰)H(xβˆ’Daβˆ—βˆ’Ξ΄dΟ‰)=xHx+aβˆ—HDHDaβˆ—βˆ’xHDaβˆ—βˆ’aβˆ—HDHxβˆ’(xβˆ’Daβˆ—)HΞ΄dΟ‰βˆ’Ξ΄dΟ‰H(xβˆ’Daβˆ—)+β€–Ξ΄dΟ‰β€–22.
  6. Canceling the like terms we get

    β€–Ξ΄dΟ‰β€–22βˆ’2Re(⟨xβˆ’Daβˆ—,Ξ΄dΟ‰βŸ©).
  7. Thus,

    L(aβˆ—+Ξ΄eΟ‰)βˆ’L(aβˆ—)=12β€–Ξ΄dΟ‰β€–22βˆ’Re(⟨xβˆ’Daβˆ—,Ξ΄dΟ‰βŸ©)+Ξ³|Ξ΄|.
  8. Recall that since aβˆ— is supported on Ξ›, hence Daβˆ—=DΞ›aβˆ—.

  9. We can further split the middle term by adding and subtracting DΞ›cΞ›.

    Re(⟨xβˆ’DΞ›aβˆ—,Ξ΄dΟ‰βŸ©)=Re(⟨xβˆ’DΞ›cΞ›+DΞ›cΞ›βˆ’DΞ›aβˆ—,Ξ΄dΟ‰βŸ©)=Re(⟨xβˆ’DΞ›cΞ›,Ξ΄dΟ‰βŸ©)+Re(⟨DΞ›(cΞ›βˆ’aβˆ—),Ξ΄dΟ‰βŸ©)
  10. Thus, we can write

    L(aβˆ—+Ξ΄eΟ‰)βˆ’L(aβˆ—)=12β€–Ξ΄dΟ‰β€–22βˆ’Re(⟨xβˆ’DΞ›cΞ›,Ξ΄dΟ‰βŸ©)βˆ’Re(⟨DΞ›(cΞ›βˆ’aβˆ—),Ξ΄dΟ‰βŸ©)+Ξ³|Ξ΄|.
  11. The term 12‖δdω‖22 is strictly positive giving us

    L(aβˆ—+Ξ΄eΟ‰)βˆ’L(aβˆ—)>βˆ’Re(⟨xβˆ’DΞ›cΞ›,Ξ΄dΟ‰βŸ©)βˆ’Re(⟨DΞ›(cΞ›βˆ’aβˆ—),Ξ΄dΟ‰βŸ©)+Ξ³|Ξ΄|.
  12. Using lower triangle inequality we can write

    L(aβˆ—+Ξ΄eΟ‰)βˆ’L(aβˆ—)>Ξ³|Ξ΄|βˆ’|⟨xβˆ’DΞ›cΞ›,Ξ΄dΟ‰βŸ©|βˆ’|⟨DΞ›(cΞ›βˆ’aβˆ—),Ξ΄dΟ‰βŸ©|.
  13. Using linearity of inner product, we can take out |Ξ΄|:

    (14.21)ΒΆL(aβˆ—+Ξ΄eΟ‰)βˆ’L(aβˆ—)>|Ξ΄|[Ξ³βˆ’|⟨xβˆ’DΞ›cΞ›,dΟ‰βŸ©|βˆ’|⟨DΞ›(cΞ›βˆ’aβˆ—),dΟ‰βŸ©|].
  14. Let us simplify this expression. Since aβˆ— is a unique minimizer over coefficients in CΞ›, hence using Theorem 14.11

    cΞ›βˆ’aβˆ—=Ξ³(DΞ›HDΞ›)βˆ’1g⟺DΞ›(cΞ›βˆ’aβˆ—)=Ξ³DΞ›(DΞ›HDΞ›)βˆ’1g=(DΛ†)Hg.

    where gβˆˆβˆ‚β€–aβˆ—β€–1.

  15. Thus

    |⟨DΞ›(cΞ›βˆ’aβˆ—),dΟ‰βŸ©|=Ξ³|⟨(DΛ†)Hg,dΟ‰βŸ©|=Ξ³|⟨DΛ†dΟ‰,g⟩|

    using the fact that ⟨Ax,y⟩=⟨x,AHy⟩.

  16. Also, we recall that x^Ξ›=DΞ›cΞ›.

  17. Putting the back in (14.21) we obtain:

    (14.22)ΒΆL(aβˆ—+Ξ΄eΟ‰)βˆ’L(aβˆ—)>|Ξ΄|[Ξ³βˆ’Ξ³|⟨DΛ†dΟ‰,g⟩|βˆ’|⟨xβˆ’x^Ξ›,dΟ‰βŸ©|].
  18. In (14.22), the L.H.S. is non-negative (our real goal) whenever the term in the bracket on the R.H.S. is non-negative (since |Ξ΄| is positive).

  19. Therefore we want that

    Ξ³βˆ’Ξ³|⟨DΛ†dΟ‰,g⟩|βˆ’|⟨xβˆ’x^Ξ›,dΟ‰βŸ©|β‰₯0.
  20. This can be rewritten as

    |⟨xβˆ’x^Ξ›,dΟ‰βŸ©|≀γ[1βˆ’|⟨DΛ†dΟ‰,g⟩|].
  21. Since this condition should hold for every Ο‰βˆ‰Ξ›, hence we maximize the L.H.S. and minimize the R.H.S. over Ο‰βˆ‰Ξ›.

  22. We get

    maxΟ‰βˆ‰Ξ›|⟨xβˆ’x^Ξ›,dΟ‰βŸ©|≀minΟ‰βˆ‰Ξ›Ξ³[1βˆ’|⟨DΛ†dΟ‰,g⟩|]=Ξ³[1βˆ’maxΟ‰βˆ‰Ξ›|⟨DΛ†dΟ‰,g⟩|].
  23. Recall that xβˆ’x^Ξ› is orthogonal to the space spanned by atoms in DΞ›.

  24. Hence

    maxΟ‰βˆ‰Ξ›|⟨xβˆ’x^Ξ›,dΟ‰βŸ©|=maxΟ‰|⟨xβˆ’x^Ξ›,dΟ‰βŸ©|=β€–DH(xβˆ’x^Ξ›)β€–βˆž.
  25. This gives us the desired sufficient condition

    β€–DH(xβˆ’x^Ξ›)β€–βˆžβ‰€Ξ³[1βˆ’maxΟ‰βˆ‰Ξ›|⟨DΛ†dΟ‰,g⟩|].
  26. This condition still uses g. We know that β€–gβ€–βˆžβ‰€1.

  27. Let us simplify as follows:

    |⟨DΛ†dΟ‰,g⟩|=|(DΛ†dΟ‰)Hg|=β€–(DΛ†dΟ‰)Hgβ€–βˆžβ‰€β€–(DΛ†dΟ‰)Hβ€–βˆžβ€–gβ€–βˆž=β€–(DΛ†dΟ‰)β€–1β€–gβ€–βˆžβ‰€β€–(DΛ†dΟ‰)β€–1.
  28. Another way to understand this is as follows. For any vector v∈CD

    |⟨v,g⟩|=|βˆ‘i=1Dgi―vi|β‰€βˆ‘i=1D|gi||vi|≀[βˆ‘i=1D|vi|]β€–gβ€–βˆžβ‰€β€–vβ€–1.
  29. Thus

    |⟨DΛ†dΟ‰,g⟩|≀‖DΛ†dΟ‰β€–1.
  30. Thus, it is also sufficient that

    β€–DH(xβˆ’x^Ξ›)β€–βˆžβ‰€Ξ³[1βˆ’maxΟ‰βˆ‰Ξ›β€–(DΛ†dΟ‰)β€–1].

14.2.5.3. Exact Recovery CoefficientΒΆ

We recall that Exact Recovery Coefficient for a subdictionary is defined as

ERC⁑(Ξ›)=1βˆ’maxΟ‰βˆ‰Ξ›β€–DΛ†dΟ‰β€–1.

Thus, the sufficient condition can be rewritten as

β€–DH(xβˆ’x^Ξ›)β€–βˆžβ‰€Ξ³ERC⁑(Ξ›).
  1. Note that the L.H.S. in both sufficient conditions is always non-negative.

  2. Hence, if the R.H.S. is negative (i.e. ERC⁑(Ξ›)<0), the sufficient condition is useless.

  3. On the other hand if ERC⁑(Ξ›)>0, then a sufficiently high Ξ³ can always be chosen to satisfy the condition in (14.20).

  4. At the same time as Ξ³β†’βˆž, the optimum minimizer is aβˆ—=0.

How do we interpret the L.H.S. β€–DH(xβˆ’x^Ξ›)β€–βˆž?

Definition 14.2

Given a non-zero signal v and a dictionary D, define the function

maxcor⁑(v)β‰œmaxΟ‰βˆˆΞ©|⟨v,dΟ‰βŸ©|β€–vβ€–2.

If v=0, then define maxcor⁑(v)=0.

This is known as the maximum correlation [37] of a signal with a dictionary.

Essentially, for any signal we normalize it and then find out its maximum inner product (absolute value) with atoms in the dictionary D. Clearly 0≀maxcor⁑(v)≀1.

We can see that

β€–DHvβ€–βˆž=maxcor⁑(v)β€–vβ€–2.

We can now interpret

β€–DH(xβˆ’x^Ξ›)β€–βˆž=maxcor⁑(xβˆ’x^Ξ›)β€–xβˆ’x^Ξ›β€–2.

Therefore, the sufficient condition in Theorem 14.13 is strongest when the magnitude of the residual (xβˆ’x^Ξ›) and its maximum correlation with the dictionary are both small.

Since the maximum correlation of the residual never exceeds one, hence we obtain following (much weaker result)

Corollary 14.1

Let Ξ› index a subdictionary and let x be an input signal. Suppose that the residual vector xβˆ’x^Ξ› satisfies

β€–xβˆ’x^Ξ›β€–2≀γERC⁑(Ξ›).

Then any coefficient vector aβˆ— that minimizes the function L must be supported inside Ξ›.

14.2.5.4. Applications of β„“1 PenalizationΒΆ

Having setup the basic results in place, we can now study the applications of (14.15).

Theorem 14.14 (BPDN reconstruction guarantees using ERC)

Let Ξ› index a subdictionary DΞ› for which ERC⁑(Ξ›)β‰₯0. Suppose that x is an input signal whose β„“2 best approximation over Ξ› satisfies the correlation condition

β€–DΞ›H(xβˆ’x^Ξ›)β€–βˆžβ‰€Ξ³ERC⁑(Ξ›).

Let aβˆ— solve the convex program (14.15) with parameter Ξ³. We may conclude that:

  1. Support of aβˆ— is contained in Ξ›.

  2. The error between aβˆ— and the optimal coefficient vector cΞ› satisfies

    β€–aβˆ—βˆ’cΞ›β€–βˆžβ‰€Ξ³β€–(DΞ›HDΞ›)βˆ’1β€–βˆž.
  3. In particular, supp(aβˆ—) contains every index Ξ» in Ξ› for which

    |cΞ›(Ξ»)|>Ξ³β€–(DΞ›HDΞ›)βˆ’1β€–βˆž.
  4. Moreover, the minimizer aβˆ— is unique.

Proof. .

  1. Since the sufficient condition for correlation condition Theorem 14.13 are satisfied, hence aβˆ— which minimizes L over coefficient vectors in CΞ› is also a global minimizer of L.

  2. Since aβˆ—βˆˆCΞ›, hence supp(aβˆ—)βŠ†Ξ›.

  3. For claim 2, application of Theorem 14.12 gives us

    β€–cΞ›βˆ’aβˆ—β€–βˆžβ‰€Ξ³β€–(DΞ›HDΞ›)βˆ’1β€–βˆž.
  4. Since the convex function L is strictly convex, hence aβˆ— is unique global minimizer.

  5. For claim 3, suppose aβˆ—(Ξ»)=0 for some index Ξ»βˆˆΞ› for which

    |cΞ›(Ξ»)|>Ξ³β€–(DΞ›HDΞ›)βˆ’1β€–βˆž.
  6. Then

    |aβˆ—(Ξ»)βˆ’cΞ›(Ξ»)|=|cΞ›(Ξ»)|>Ξ³β€–(DΞ›HDΞ›)βˆ’1β€–βˆž.
  7. But

    β€–aβˆ—βˆ’cΞ›β€–βˆžβ‰₯|aβˆ—(Ξ»)βˆ’cΞ›(Ξ»)|.
  8. This violates the bound that

    β€–aβˆ—βˆ’cΞ›β€–βˆžβ‰€Ξ³β€–(DΞ›HDΞ›)βˆ’1β€–βˆž.
  9. Thus, supp(aβˆ—) contains every index Ξ»βˆˆΞ› for which

    |cΞ›(Ξ»)|>Ξ³β€–(DΞ›HDΞ›)βˆ’1β€–βˆž.

We can formulate a simpler condition in terms of coherence of the dictionary.

Theorem 14.15 (BPDN reconstruction guarantees using coherence)

Suppose that Kμ≀12. Assume that |Ξ›|≀K i.e. Ξ› contains at most K indices. Suppose that x is an input signal whose β„“2 best approximation over Ξ› denoted by x^Ξ› satisfies the correlation condition

β€–DΞ›H(xβˆ’x^Ξ›)β€–βˆžβ‰€Ξ³1βˆ’(2Kβˆ’1)ΞΌ1βˆ’(Kβˆ’1)ΞΌ.

Let aβˆ— solve the convex program (14.15) with parameter Ξ³. We may conclude that:

  1. Support of aβˆ— is contained in Ξ› and

  2. The distance between aβˆ— and the optimal coefficient vector cΞ› satisfies

β€–aβˆ—βˆ’cΞ›β€–βˆžβ‰€Ξ³11βˆ’(Kβˆ’1)ΞΌ.
  1. In particular, supp(aβˆ—) contains every index Ξ» in Ξ› for which

    |cΞ›(Ξ»)|>Ξ³11βˆ’(Kβˆ’1)ΞΌ.
  2. Moreover, the minimizer aβˆ— is unique.

Proof. .

  1. We recall from Theorem 12.74 the coherence bounds on ERC as

    ERC⁑(Ξ›)β‰₯1βˆ’(2Kβˆ’1)ΞΌ1βˆ’(Kβˆ’1)ΞΌ.
  2. Thus,

    β€–DΞ›H(xβˆ’x^Ξ›)β€–βˆžβ‰€Ξ³1βˆ’(2Kβˆ’1)ΞΌ1βˆ’(Kβˆ’1)μ≀γERC⁑(Ξ›).
  3. A direct application of Theorem 14.14 validates claims 1 and 4.

  4. We recall from Theorem 12.36 the upper bound on norm of inverse Gram matrix of a subdictionary as

    β€–Gβˆ’1β€–βˆž=β€–Gβˆ’1β€–1≀11βˆ’ΞΌ1(Kβˆ’1)≀11βˆ’(Kβˆ’1)ΞΌ.
  5. Putting this in Theorem 14.14 validates claims 2 and 3.

14.2.5.5. Exact Sparse Reconstruction ProblemΒΆ

We now show how one can reconstruct an exactly sparse signal solving the convex program (14.15).

Theorem 14.16 (BPDN exact sparse recovery guarantee)

Assume that Ξ› indexes a subdictionary for which ERC⁑(Ξ›)β‰₯0. Choose an arbitrary coefficient vector copt supported on Ξ›. Fix an input signal x=Dcopt. Let aβˆ—(Ξ³) denote the unique minimizer of (14.15) with parameter Ξ³. We may conclude that

  1. There is a positive number Ξ³0 for which Ξ³<Ξ³0 implies that supp(aβˆ—(Ξ³))=Ξ›.

  2. In the limit as Ξ³β†’0, we have aβˆ—(Ξ³)β†’copt.

Proof. .

  1. Since there is no noise, hence the best β„“2 approximation of x over Ξ›

    x^Ξ›=x

    itself and the corresponding coefficient vector is

    cΞ›=copt.
  2. Therefore

    β€–DΞ›H(xβˆ’x^Ξ›)β€–βˆž=0≀γERC⁑(Ξ›).
  3. Thus, the correlation condition is in force for every positive value of Ξ³.

  4. Thus, as per Theorem 14.14, minimizer aβˆ—(Ξ³) of the convex program (14.15) must be supported inside Ξ›.

  5. Moreover, we have

    β€–aβˆ—(Ξ³)βˆ’coptβ€–βˆžβ‰€Ξ³β€–(DΞ›HDΞ›)βˆ’1β€–βˆž.
  6. Clearly, as Ξ³β†’0, we have aβˆ—(Ξ³)β†’copt.

  7. Finally, recall that supp(aβˆ—(Ξ³)) contains very index Ξ» in Ξ› for which

    |copt(Ξ»)|>Ξ³β€–(DΞ›HDΞ›)βˆ’1β€–βˆž.
  8. In order for every index in Ξ› to be part of supp(aβˆ—(Ξ³)), we require

    minΞ³βˆˆΞ“|copt(Ξ»)|β€–(DΞ›HDΞ›)βˆ’1β€–βˆž>Ξ³.
  9. Choosing the L.H.S. to be Ξ³0, we get an explicit value of upper bound on Ξ³ such that Ξ³<Ξ³0 leads to complete discovery of support.