14.3. Orthogonal Matching PursuitΒΆ

Orthogonal Matching Pursuit is a greedy algorithm for computing sparse approximations of a signal in a suitable dictionary. The goal of this algorithm is to select a small number of atoms from the dictionary which can provide a representation for the signal with low error. The algorithm is iterative in structure. In each step we greedily select one new atom from the dictionary which can maximally reduce the representation error. Once we have enough atoms selected such that the representation error is below an acceptable bound, we stop. The algorithm is presented in Algorithm 14.1.

Algorithm 14.1 (Orthogonal matching pursuit for sparse approximation)

Inputs:

  • Signal dictionary D∈CNΓ—D with spark⁑(D)>2Kβ‰ͺD

  • Threshold Ο΅0

  • Signal x∈CN

Outputs:

  • K-sparse approximate representation a∈ΣKβŠ†CD satisfying β€–xβˆ’Daβ€–2≀ϡ0

  • S support for sparse solution identified by the algorithm

Initialization:

  1. k←0 # Iteration counter

  2. a0←0 # Solution vector a∈CD

  3. r0←xβˆ’Da0=x # Residual r∈CN

  4. S0β†βˆ… # Solution support S=supp(a)

Algorithm:

  1. If β€–rkβ€–2≀ϡ0 break.

  2. k←k+1 # Next iteration

  3. z←DHrkβˆ’1 # Sweep

  4. Update support

    1. Find j0 that maximizes |zj|βˆ€jβˆ‰Skβˆ’1

    2. Sk←Skβˆ’1βˆͺ{j0}

  5. Update provisional solution

    ak←minimizeaβ€–Daβˆ’xβ€–22 subject to supp(a)=Sk.
  6. rk←xβˆ’Dak=xβˆ’DSkaSkk # Update residual

  7. Go to step 1.

14.3.1. The AlgorithmΒΆ

  • We start with the initial estimate of solution a=0.

  • We also maintain the support of a; i.e., the set of indices for which a is non-zero.

  • We start with an empty support.

  • In each (k-th) iteration we attempt to reduce the difference between the actual signal x and the approximate signal based on current solution akβˆ’1 given by rkβˆ’1=xβˆ’Dakβˆ’1.

  • We do this by choosing a new index in a given by j0 for the column dj0 which most closely matches our current residual.

  • We include this to our support for a and estimate new solution vector ak.

  • We then compute the new residual rk.

  • We stop when the residual magnitude is below a threshold Ο΅0 defined by us.

Each iteration of algorithm consists of following stages:

  1. [Sweep] We try to find the best matching atom from the dictionary with the current residual.

    1. The best matching atom is selected using the least square error principle.

    2. For each column dj in our dictionary, we measure the projection of residual from previous iteration rkβˆ’1 on the column

    3. We compute the magnitude of error between the projection and residual.

    4. The square of minimum error for dj is given by:

      Ο΅2(j)=β€–rkβˆ’1β€–22βˆ’|djHrkβˆ’1|2.
    5. We can also note that minimizing over Ο΅(j) is equivalent to maximizing over the inner product of dj with rkβˆ’1.

  2. [Update support] Ignoring the columns which have already been included in the support, we pick up the column which most closely resembles the residual of previous stage; i.e., the magnitude of error is minimum. We include the index of this column j0 in the support set Sk.

  3. [Update provisional solution]

    1. In this step we find the solution of minimizing β€–Daβˆ’xβ€–2 over the support Sk as our next candidate solution vector.

    2. By keeping ai=0 for iβˆ‰Sk we are leaving out corresponding columns di from our calculations.

    3. Thus we pickup up only the columns specified by Sk from D.

    4. Let us call this subdictionary as DSk.

    5. The size of this subdictionary is NΓ—|Sk|=NΓ—k.

    6. Let us call corresponding sub vector as aSk.

      Suppose D=4, then D=[d1d2d3d4]. Let Sk={1,4}. Then DSk=[d1d4] and aSk=(a1,a4).

    7. Our minimization problem then reduces to minimizing β€–DSkaSkβˆ’xβ€–2.

    8. We use standard least squares estimate for getting the coefficients for aSk over these indices.

    9. We put back aSk to obtain our new solution estimate ak.

      In the running example after obtaining the values a1 and a4, we will have ak=(a1,0,0,a4).

    10. The solution to this minimization problem is given by

      DSkH(DSkaSkβˆ’x)=0⟹aSk=(DSkHDSk)βˆ’1DSkHx.
    11. Interestingly we note that rk=xβˆ’Dak=xβˆ’DSkaSk, thus

      DSkHrk=0

      which means that columns in DSk which are part of support Sk are necessarily orthogonal to the residual rk.

    12. This implies that these columns will not be considered in the coming iterations for extending the support.

    13. This orthogonality is the reason behind the name of the algorithm as OMP.

  4. [Update residual] We finally update the residual vector to rk based on new solution vector estimate.

Example 14.1 ((D,K)-EXACT-SPARSE reconstruction with OMP)

Let us consider a dictionary of size 10Γ—20. Thus N=10 and D=20. In order to fit into the display, we will present the matrix in two 10 column parts.

Da=110[βˆ’1βˆ’1βˆ’11βˆ’1βˆ’111βˆ’1111111βˆ’1βˆ’11βˆ’1βˆ’1βˆ’1βˆ’1βˆ’1βˆ’1βˆ’1111111βˆ’1βˆ’1111βˆ’1111111βˆ’1βˆ’11βˆ’1βˆ’1111βˆ’11βˆ’1βˆ’1βˆ’11βˆ’11βˆ’1βˆ’1βˆ’111βˆ’1βˆ’1βˆ’1βˆ’11βˆ’11βˆ’111βˆ’11βˆ’1βˆ’1βˆ’11βˆ’11βˆ’111βˆ’1βˆ’1βˆ’1111111βˆ’11βˆ’11βˆ’11]Db=110[1βˆ’1βˆ’1βˆ’1111βˆ’1βˆ’1βˆ’1111βˆ’1βˆ’1βˆ’1βˆ’1βˆ’1βˆ’11βˆ’111111βˆ’1βˆ’1βˆ’1βˆ’11βˆ’11βˆ’1111βˆ’1βˆ’1βˆ’11βˆ’1βˆ’1111βˆ’111βˆ’1βˆ’1111βˆ’11βˆ’11βˆ’11βˆ’111βˆ’11βˆ’1βˆ’1βˆ’1111βˆ’1βˆ’111βˆ’1βˆ’11βˆ’111111βˆ’1βˆ’1111βˆ’1βˆ’1βˆ’111βˆ’111βˆ’1βˆ’11]

with D=[DaDb].

  1. You may verify that each column is unit norm.

  2. It is known that rank(D)=10 and spark⁑(D)=6.

  3. Thus if a signal x has a 2 sparse representation in D then the representation is necessarily unique.

  4. We now consider a signal x given by

    x=(4.74342βˆ’4.743421.58114βˆ’4.74342βˆ’1.581141.58114βˆ’4.74342βˆ’1.58114βˆ’4.74342βˆ’4.74342).

    For saving space, we have written it as an n-tuple over two rows. You should treat it as a column vector of size 10Γ—1.

  5. It is known that the vector has a two sparse representation in D.

  6. Let us go through the steps of OMP and see how it works.

  7. In step 0, r0=x, a0=0, and S0=βˆ….

  8. We now compute absolute value of inner product of r0 with each of the columns. They are given by

    (444731111212170240213).
  9. We quickly note that the maximum occurs at index 7 with value 11.

  10. We modify our support to S1={7}.

  11. We now solve the least squares problem

    minimizeβ€–xβˆ’[d7]a7β€–22.
  12. The solution gives us a7=11.00.

  13. Thus we get

    a1=(000000110000000000000).

    Again note that to save space we have presented a over two rows. You should consider it as a 20Γ—1 column vector.

  14. This leaves us the residual as

    r1=xβˆ’Da1=(1.26491βˆ’1.26491βˆ’1.89737βˆ’1.264911.89737βˆ’1.89737βˆ’1.264911.89737βˆ’1.26491βˆ’1.26491).
  15. We can cross check that the residual is indeed orthogonal to the columns already selected, for

    ⟨r1,d7⟩=0.
  16. Next we compute inner product of r1 with all the columns in D and take absolute values.

  17. They are given by

    (0.40.40.40.40.81.201.221.22.43.24.8020.4021.20.8)
  18. We quickly note that the maximum occurs at index 13 with value 4.8.

  19. We modify our support to S1={7,13}.

  20. We now solve the least squares problem

    minimizeβ€–xβˆ’[d7d13][a7a13]β€–22.
  21. This gives us a7=10 and a13=βˆ’5.

  22. Thus we get

    a2=(0000001000000βˆ’50000000)
  23. Finally the residual we get at step 2 is

    r2=xβˆ’Da2=10βˆ’14(00βˆ’0.11102200.111022βˆ’0.11102200.11102200)
  24. The magnitude of residual is very small.

  25. We conclude that our OMP algorithm has converged and we have been able to recover the exact 2 sparse representation of x in D.

14.3.2. Exact Recovery ConditionsΒΆ

In this subsection, following [35], we will closely look at some conditions under which OMP is guaranteed to recover the solution for (D,K)-EXACT-SPARSE problem.

  1. It is given that x=Da where a contains at most K non-zero entries.

  2. Both the support and entries of a are known and can be used to verify the correctness of OMP. Note that a itself won’t be given to OMP.

  3. Let Ξ›opt=supp(a); i.e., the set of indices at which optimal representation a has non-zero entries.

  4. Then we can write

    x=βˆ‘iβˆˆΞ›aidi.
  5. From the dictionary D we can extract a NΓ—K matrix Dopt whose columns are indexed by Ξ›opt.

    Doptβ‰œ[dΞ»1…dΞ»K]

    where Ξ»iβˆˆΞ›opt.

  6. Thus we can also write

    x=Doptaopt

    where aopt∈CK is a vector of K complex entries.

  7. The columns of optimum Dopt are linearly independent. Hence Dopt has full column rank.

  8. We define another matrix Hopt whose columns are the remaining Dβˆ’K columns of D.

  9. Thus Hopt consists of atoms or columns which do not participate in the optimum representation of x.

  10. OMP starts with an empty support.

  11. At every step, it picks up one column from D and adds to the support of approximation.

  12. If we can ensure that it never selects any column from Hopt we will be guaranteed that correct K sparse representation is recovered.

  13. We will use mathematical induction and assume that OMP has succeeded in its first k steps and has chosen k columns from Dopt so far.

  14. At this point it is left with the residual rk.

  15. In (k+1)-th iteration, we compute inner product of rk with all columns in D and choose the column which has highest inner product.

  16. We note that the maximum value of inner product of rk with any of the columns in Hopt is given by

    β€–HoptHrkβ€–βˆž.
  17. Correspondingly, the maximum value of inner product of rk with any of the columns in Dopt is given by

    β€–DoptHrkβ€–βˆž.
  18. since we have shown that rk is orthogonal to the columns already chosen, hence they will not contribute to this term.

  19. In order to make sure that none of the columns in Hopt is selected, we need

    β€–HoptHrkβ€–βˆž<β€–DoptHrkβ€–βˆž.

Definition 14.3 (Greedy selection ratio)

We define a ratio

(14.23)¢ρ(rk)β‰œβ€–HoptHrkβ€–βˆžβ€–DoptHrkβ€–βˆž.

This ratio is known as greedy selection ratio.

  1. We can see that as long as ρ(rk)<1, OMP will make a right decision at (k+1)-th stage.

  2. If ρ(rk)=1 then there is no guarantee that OMP will make the right decision.

  3. We will assume pessimistically that OMP makes wrong decision in such situations.

We note that this definition of ρ(rk) looks very similar to matrix p-norms defined in Definition 4.163. It is suggested to review the properties of p-norms for matrices at this point.

We now present a condition which guarantees that ρ(rk)<1 is always satisfied.

Theorem 14.17 (A sufficient condition for exact recovery using OMP)

A sufficient condition for Orthogonal Matching Pursuit to resolve x completely in K steps is that

(14.24)ΒΆmaxhβ€–Dopt†hβ€–1<1,

where h ranges over columns in Hopt.

Moreover, Orthogonal Matching Pursuit is a correct algorithm for the (D,K)-EXACT-SPARSE problem whenever the condition holds for every superposition of K atoms from D.

Proof. .

  1. In (14.24), Dopt† is the pseudo-inverse of Dopt.

    Dopt†=(DoptHDopt)βˆ’1DoptH.
  2. What we need to show is if (14.24) holds true then ρ(rk) will always be less than 1.

  3. We note that the projection operator for the column span of Dopt is given by

    Dopt(DoptHDopt)βˆ’1DoptH=(Dopt†)HDoptH.
  4. We also note that by assumption since x∈C(Dopt) and the approximation at the k-th step, xk=Dak∈C(Dopt).

  5. Hence rk=xβˆ’xk also belongs to C(Dopt).

  6. Thus

    rk=(Dopt†)HDoptHrk

    i.e. applying the projection operator for Dopt on rk doesn’t change it.

  7. Using this we can rewrite ρ(rk) as

    ρ(rk)=β€–HoptHrkβ€–βˆžβ€–DoptHrkβ€–βˆž=β€–HoptH(Dopt†)HDoptHrkβ€–βˆžβ€–DoptHrkβ€–βˆž.
  8. We see the term DoptHrk appearing both in numerator and denominator.

  9. Now consider the matrix HoptH(Dopt†)H and recall the definition of matrix ∞-norm from Definition 4.163

    β€–Aβ€–βˆž=maxxβ‰ 0β€–Axβ€–βˆžβ€–xβ€–βˆžβ‰₯β€–Axβ€–βˆžβ€–xβ€–βˆžβˆ€xβ‰ 0.
  10. Thus

    β€–HoptH(Dopt†)Hβ€–βˆžβ‰₯β€–HoptH(Dopt†)HDoptHrkβ€–βˆžβ€–DoptHrkβ€–βˆž

    which gives us

    ρ(rk)≀‖HoptH(Dopt†)Hβ€–βˆž=β€–(Dopt†Hopt)Hβ€–βˆž.
  11. We recall that β€–Aβ€–βˆž is max row sum norm while β€–Aβ€–1 is max column sum norm.

  12. Hence

    β€–Aβ€–βˆž=β€–ATβ€–1=β€–AHβ€–1

    which means

    β€–(Dopt†Hopt)Hβ€–βˆž=β€–Dopt†Hoptβ€–1.
  13. Thus we have:

    ρ(rk)≀‖Dopt†Hoptβ€–1.
  14. Now the columns of Dopt†Hopt are nothing but Dopt†h where h ranges over columns of Hopt.

  15. Thus in terms of max column sum norm

    ρ(rk)≀maxhβ€–Dopt†hβ€–1.
  16. Thus assuming that OMP has made k correct decision and rk lies in C(Dopt), ρ(rk)<1 whenever

    maxhβ€–Dopt†hβ€–1<1.
  17. The initial residual r0=x which always lies in column space of Dopt.

  18. By above logic, OMP will always select an optimal column in each step.

  19. Since the residual is always orthogonal to the columns already selected, hence it will never select the same column twice.

  20. Thus in K steps it will retrieve all K atoms which comprise x.

14.3.3. Babel Function EstimatesΒΆ

There is a small problem with Theorem 14.17. Since we don’t know the support of a a-priori hence its not possible to verify that

maxhβ€–Dopt†hβ€–1<1

holds. Verifying this for all K column sub-matrices is computationally prohibitive.

It turns out that Babel function (recall Definition 12.21) can be used to relax the sufficient condition in a manner so that it becomes computationally tractable. We show how Babel function guarantees that exact recovery condition for OMP holds.

Theorem 14.18 (Babel function based guarantee for exact recovery using OMP)

Suppose that ΞΌ1 is the Babel function for a dictionary D. The exact recovery condition holds whenever

(14.25)ΒΆΞΌ1(Kβˆ’1)+ΞΌ1(K)<1.

Thus, Orthogonal Matching Pursuit is a correct algorithm for the (D,K)-EXACT-SPARSE problem whenever (14.25) holds.

In other words, for sufficiently small K for which (14.25) holds, OMP will recover any arbitrary superposition of K atoms from D.

Proof. .

  1. We can write

    maxhβ€–Dopt†hβ€–1=maxhβ€–(DoptHDopt)βˆ’1DoptHhβ€–1
  2. We recall from Lemma 4.91 that operator-norm is subordinate; i.e.,

    β€–Axβ€–1≀‖Aβ€–1β€–xβ€–1.
  3. Thus with A=(DoptHDopt)βˆ’1 we have

    β€–(DoptHDopt)βˆ’1DoptHhβ€–1≀‖(DoptHDopt)βˆ’1β€–1β€–DoptHhβ€–1.
  4. With this we have

    maxhβ€–Dopt†hβ€–1≀‖(DoptHDopt)βˆ’1β€–1maxhβ€–DoptHhβ€–1.
  5. Now let us look at β€–DoptHhβ€–1 closely.

  6. There are K columns in Dopt.

  7. For each column we compute its inner product with h and then absolute sum of the inner product.

  8. Also recall the definition of Babel function:

    ΞΌ1(K)=max|Ξ›|=Kmaxhβˆ‘Ξ›|⟨h,dλ⟩|.
  9. Clearly

    maxhβ€–DoptHhβ€–1=maxhβˆ‘Ξ›opt|⟨h,dΞ»i⟩|≀μ1(K).
  10. We also need to provide a bound on β€–(DoptHDopt)βˆ’1β€–1 which requires more work.

  11. First note that since all columns in D are unit norm, hence the diagonal of DoptHDopt contains unit entries.

  12. Thus we can write

    DoptHDopt=IK+A

    where A contains the off diagonal terms in DoptHDopt.

  13. Looking carefully , each column of A lists the inner products between one atom of Dopt and the remaining Kβˆ’1 atoms.

  14. By definition of Babel function

    β€–Aβ€–1=maxk=1Kβˆ‘j,jβ‰ k|⟨dΞ»kdΞ»j⟩|≀μ1(Kβˆ’1).
  15. Whenever β€–Aβ€–1<1 then the Von Neumann series βˆ‘(βˆ’A)k converges to the inverse (IK+A)βˆ’1.

  16. Thus we have

    β€–(DoptHDopt)βˆ’1β€–1=β€–(IK+A)βˆ’1β€–1=β€–βˆ‘k=0∞(βˆ’A)kβ€–1β‰€βˆ‘k=0βˆžβ€–Aβ€–1k=11βˆ’β€–Aβ€–1≀11βˆ’ΞΌ1(Kβˆ’1).
  17. Putting things together we get

    maxhβ€–Dopt†hβ€–1≀μ1(K)1βˆ’ΞΌ1(Kβˆ’1).
  18. Thus whenever ΞΌ1(Kβˆ’1)+ΞΌ1(K)<1 holds,
    we have

    maxhβ€–Dopt†hβ€–1<1.

14.3.4. Sparse Approximation ConditionsΒΆ

We now remove the assumption that x is K-sparse in D. This is indeed true for all real life signals as they are not truly sparse.

In this subsection we will look at conditions under which OMP can successfully solve the (D,K)-SPARSE approximation problem as described in Definition 12.8.

  1. Let x be an arbitrary signal.

  2. Suppose that aopt is an optimal K-term approximation representation of x; i.e., aopt is a solution to (12.19) and the optimal K-term approximation of x is given by

    xopt=Daopt.

    We note that aopt may not be unique.

  3. Let Ξ›opt be the support of aopt which identifies the atoms in D that participate in the K-term approximation of x.

  4. Let Dopt be the subdictionary consisting of columns of D indexed by Ξ›opt.

  5. We assume that columns in Dopt are linearly independent. This is easily established since if any atom in this set were linearly dependent on other atoms, we could always find a sparser solution which would contradict the fact that aopt is optimal.

  6. Again let Hopt be the matrix of (Dβˆ’K) columns which are not indexed by Ξ›opt.

  7. We note that if Ξ›opt is identified then finding aopt is a straightforward least squares problem.

We now present a condition under which Orthogonal Matching Pursuit is able to recover the optimal atoms.

Theorem 14.19

Assume that ΞΌ1(K)<12, and suppose that at k-th iteration, the support Sk for ak consists only of atoms from an optimal k-term approximation of the signal x. At step (k+1), Orthogonal Matching Pursuit will recover another atom indexed by Ξ›opt whenever

(14.26)ΒΆβ€–xβˆ’Dakβ€–2>1+K(1βˆ’ΞΌ1(K))(1βˆ’2ΞΌ1(K))2β€–xβˆ’Daoptβ€–2.

A few remarks are in order.

  1. β€–xβˆ’Dakβ€–2 is the approximation error norm at k-th iteration.

  2. β€–xβˆ’Daoptβ€–2 is the optimum approximation error after K iterations.

  3. The theorem says that OMP makes absolute progress whenever the current approximation error is larger than the optimum error by a factor.

  4. As a result of this theorem, we note that every optimal K-term approximation of x contains the same kernel of atoms.

  5. The optimum error is always independent of choice of atoms in K term approximation (since it is optimum).

  6. Initial error is also independent of choice of atoms (since initial support is empty).

  7. OMP always selects the same set of atoms by design.

Proof. .

  1. Let us assume that after k steps, OMP has recovered an approximation xk given by

    xk=DSkak

    where Sk=supp(ak) chooses k columns from D all of which belong to Dopt.

  2. Let the residual at k-th stage be

    rk=xβˆ’xk=xβˆ’DSkak.
  3. Recalling from previous section, a sufficient condition for recovering another optimal atom is

    ρ(rk)=β€–HoptHrkβ€–βˆžβ€–DoptHrkβ€–βˆž<1.
  4. One difference from previous section is that rkβˆ‰C(Dopt).

  5. We can write

    rk=xβˆ’xk=(xβˆ’xopt)+(xoptβˆ’xk).
  6. Note that (xβˆ’xopt) is nothing but the residual left after K iterations.

  7. We also note that since residual in OMP is always orthogonal to already selected columns, hence

    DoptH(xβˆ’xopt)=0.
  8. We will now use these expressions to simplify ρ(rk).

    ρ(rk)=β€–HoptHrkβ€–βˆžβ€–DoptHrkβ€–βˆž=β€–HoptH(xβˆ’xopt)+HoptH(xoptβˆ’xk)β€–βˆžβ€–DoptH(xβˆ’xopt)+DoptH(xoptβˆ’xk)β€–βˆž=β€–HoptH(xβˆ’xopt)+HoptH(xoptβˆ’xk)β€–βˆžβ€–DoptH(xoptβˆ’xk)β€–βˆžβ‰€β€–HoptH(xβˆ’xopt)β€–βˆžβ€–DoptH(xoptβˆ’xk)β€–βˆž+β€–HoptH(xoptβˆ’xk)β€–βˆžβ€–DoptH(xoptβˆ’xk)β€–βˆž
  9. We now define two new terms

    ρerr(rk)β‰œβ€–HoptH(xβˆ’xopt)β€–βˆžβ€–DoptH(xoptβˆ’xk)β€–βˆž

    and

    ρopt(rk)β‰œβ€–HoptH(xoptβˆ’xk)β€–βˆžβ€–DoptH(xoptβˆ’xk)β€–βˆž.
  10. With these we have

    (14.27)¢ρ(rk)≀ρopt(rk)+ρerr(rk)
  11. Now xopt has an exact K-term representation in D given by aopt.

  12. Hence ρopt(rk) is nothing but ρ(rk) for corresponding EXACT-SPARSE problem.

  13. From the proof of Theorem 14.18 we recall

    ρopt(rk)≀μ1(K)1βˆ’ΞΌ1(Kβˆ’1)≀μ1(K)1βˆ’ΞΌ1(K)

    since

    ΞΌ1(Kβˆ’1)≀μ1(K)⟹1βˆ’ΞΌ1(Kβˆ’1)β‰₯1βˆ’ΞΌ1(K).
  14. The remaining problem is ρerr(rk).

  15. Let us look at its numerator and denominator one by one.

  16. β€–HoptH(xβˆ’xopt)β€–βˆž is the maximum (absolute) inner product between any column in Hopt with xβˆ’xopt.

  17. We can write

    β€–HoptH(xβˆ’xopt)β€–βˆžβ‰€maxh|hH(xβˆ’xopt)|≀maxhβ€–hβ€–2β€–xβˆ’xoptβ€–2=β€–xβˆ’xoptβ€–2

    since all columns in D are unit norm. In between we used Cauchy-Schwartz inequality.

  18. Now look at denominator β€–DoptH(xoptβˆ’xk)β€–βˆž where (xoptβˆ’xk)∈CN and Dopt∈CNΓ—K.

  19. Thus

    DoptH(xoptβˆ’xk)∈CK.
  20. Now for every v∈CK we have

    β€–vβ€–2≀Kβ€–vβ€–βˆž.
  21. Hence

    β€–DoptH(xoptβˆ’xk)β€–βˆžβ‰₯Kβˆ’1/2β€–DoptH(xoptβˆ’xk)β€–2.
  22. Since Dopt has full column rank, hence its singular values are non-zero.

  23. Thus

    β€–DoptH(xoptβˆ’xk)β€–2β‰₯Οƒmin(Dopt)β€–xoptβˆ’xkβ€–2.
  24. From Corollary 12.2 we have

    Οƒmin(Dopt)β‰₯1βˆ’ΞΌ1(Kβˆ’1)β‰₯1βˆ’ΞΌ1(K).
  25. Combining these observations we get

    ρerr(rk)≀Kβ€–xβˆ’xoptβ€–21βˆ’ΞΌ1(K)β€–xoptβˆ’xkβ€–2.
  26. Now from (14.27) ρ(rk)<1 whenever ρopt(rk)+ρerr(rk)<1.

  27. Thus a sufficient condition for ρ(rk)<1 can be written as

    ΞΌ1(K)1βˆ’ΞΌ1(K)+Kβ€–xβˆ’xoptβ€–21βˆ’ΞΌ1(K)β€–xoptβˆ’xkβ€–2<1.
  28. We need to simplify this expression a bit. Multiplying by (1βˆ’ΞΌ1(K)) on both sides we get

    ΞΌ1(K)+K1βˆ’ΞΌ1(K)β€–xβˆ’xoptβ€–2β€–xoptβˆ’xkβ€–2<1βˆ’ΞΌ1(K)⟹K(1βˆ’ΞΌ1(K))β€–xβˆ’xoptβ€–2β€–xoptβˆ’xkβ€–2<1βˆ’2ΞΌ1(K)βŸΉβ€–xoptβˆ’xkβ€–2>K(1βˆ’ΞΌ1(K))1βˆ’2ΞΌ1(K)β€–xβˆ’xoptβ€–2.

    We assumed ΞΌ1(K)<12 thus 1βˆ’2ΞΌ1(K)>0 which validates the steps above.

  29. We recall that (xβˆ’xopt)βŠ₯C(Dopt) and (xoptβˆ’xk)∈C(Dopt) thus (xβˆ’xopt) and (xoptβˆ’xk) are orthogonal to each other.

  30. Thus by applying Pythagorean theorem we have

    β€–xβˆ’xkβ€–22=β€–xβˆ’xoptβ€–22+β€–xoptβˆ’xkβ€–22.
  31. Thus we have

    β€–xβˆ’xkβ€–22>K(1βˆ’ΞΌ1(K))(1βˆ’2ΞΌ1(K))2β€–xβˆ’xoptβ€–22+β€–xβˆ’xoptβ€–22.
  32. This gives us a sufficient condition

    (14.28)ΒΆβ€–xβˆ’xkβ€–2>1+K(1βˆ’ΞΌ1(K))(1βˆ’2ΞΌ1(K))2β€–xβˆ’xoptβ€–2.
  33. In other words, whenever (14.28) holds true, we have ρ(rk)<1 which leads to OMP making a correct choice and choosing an atom from the optimal set.

  34. Putting xk=Dak and xopt=Daopt we get back (14.26) which is the desired result.

Theorem 14.19 establishes that as long as (14.26) holds for each of the steps from 1 to K, OMP will recover a K term optimum approximation xopt. If x∈CN is completely arbitrary, then it may not be possible that (14.26) holds for all the K iterations. In this situation, a question arises as to what is the worst K-term approximation error that OMP will incur if (14.26) doesn’t hold true all the way.

This is answered in following corollary of Theorem 14.19.

Corollary 14.2 (An estimate for the worst case K-term approximation error by OMP)

Assume that μ1(K)<12 and let x∈CN be a completely arbitrary signal. Orthogonal Matching Pursuit produces a K-term approximation xK which satisfies

(14.29)ΒΆβ€–xβˆ’xKβ€–2≀1+C(D,K)β€–xβˆ’xoptβ€–2

where xopt is the optimum K-term approximation of x in dictionary D (i.e. xopt=Daopt where aopt is an optimal solution of (12.19)). C(D,K) is a constant depending upon the dictionary D and the desired sparsity level K. An estimate of C(D,K) is given by

C(D,K)≀K(1βˆ’ΞΌ1(K))(1βˆ’2ΞΌ1(K))2.

Proof. .

  1. Suppose that OMP runs fine for first p steps where p<K.

  2. Thus (14.26) keeps holding for first p steps.

  3. We now assume that (14.26) breaks at step p+1 and OMP is no longer guaranteed to make an optimal choice of column from Dopt.

  4. Thus at step p+1 we have

    β€–xβˆ’xpβ€–2≀1+K(1βˆ’ΞΌ1(K))(1βˆ’2ΞΌ1(K))2β€–xβˆ’xoptβ€–2.
  5. Any further iterations of OMP will only reduce the error further (although not in an optimal way).

  6. This gives us

    β€–xβˆ’xKβ€–2≀‖xβˆ’xpβ€–2≀1+K(1βˆ’ΞΌ1(K))(1βˆ’2ΞΌ1(K))2β€–xβˆ’xoptβ€–2.
  7. Choosing

    C(D,K)=K(1βˆ’ΞΌ1(K))(1βˆ’2ΞΌ1(K))2

    we can rewrite this as

    β€–xβˆ’xKβ€–2≀1+C(D,K)β€–xβˆ’xoptβ€–2.

This is a very useful result. It establishes that even if OMP is not able to recover the optimum K-term representation of x, it always constructs an approximation whose error lies within a constant factor of optimum approximation error where the constant factor is given by 1+C(D,K).

  1. If the optimum approximation error β€–xβˆ’xoptβ€–2 is small then β€–xβˆ’xKβ€–2 will also be not too large.

  2. If β€–xβˆ’xoptβ€–2 is moderate, then the OMP may inflate the approximation error to a higher value. But in this case, probably sparse approximation is not the right tool for signal representation over the dictionary.