15.3. Orthogonal Matching PursuitΒΆ

In this section, we consider the application of Orthogonal Matching Pursuit (OMP) for the problem of recovering a signal from its compressive measurements. This section is largely based on [38].

Please review the notation of compressing sensing process from Definition 12.23.

We will restrict our attention to the N dimensional Euclidean space as our signal space for the moment.

  1. x∈RN is a signal vector.

  2. Φ∈RMΓ—N is the real sensing matrix with Mβ‰ͺN.

  3. The measurement vector y∈RM is given by

    y=Ξ¦x

    where RM is our measurement space.

  4. y is known, Ξ¦ is known while x is unknown to the recovery algorithm.

  5. x is assumed to be either K-sparse or K-compressible.

The sparse recovery problem can be written as

(15.5)ΒΆminimize xβ€–yβˆ’Ξ¦xβ€–2subject toβ€–xβ€–0≀K.

Though the problem looks similar to (D,K)-SPARSE approximation problem, but there are differences since Ξ¦ is not a dictionary (see Sensing Matrices). In particular, we don’t require each column to be unit norm.

We will adapt OMP algorithm studied in Orthogonal Matching Pursuit for the problem of sparse recovery in compressed sensing framework. In the analysis of OMP for CS We will address following questions:

  • How many measurements are required to recover x from y exactly if x is K-sparse?

  • What kind of sensing matrices are admissible for OMP to work in CS framework?

  • If x is not K-sparse, then how much maximum error is incurred?

15.3.1. The AlgorithmΒΆ

OMP algorithm adapted to CS is presented in Algorithm 15.1.

Algorithm 15.1 (Orthogonal matching pursuit for sparse recovery from compressive measurements)

Inputs:

  • Sensing matrix Φ∈RMΓ—N

  • Measurement vector y∈RM

  • Desired sparsity level K

Outputs:

  • A K-sparse estimate x^∈ΣKβŠ†RN for the ideal signal x

  • An index set Ξ›KβŠ‚{1,…,N} identifying the support of x^

  • An approximation yK∈RM of y

  • A residual rK=yβˆ’yK∈RM

Initialization:

  1. k←0 # Iteration counter

  2. x0←0 # Initial Estimate of x∈RN

  3. y0←Φx0=0 # Approximation of y

  4. r0←yβˆ’y0=y # Residual r∈RM

  5. Ξ›0β†βˆ… # Solution support Ξ›=supp(x^)

Algorithm:

  1. If k==K or rk==0: break.

  2. Increase the iteration count

    k←k+1.
  3. Sweep (find column with largest inner product)

    Ξ»k=argmax1≀j≀N|⟨rkβˆ’1,Ο•j⟩|.
  4. Update support

    Ξ›k←Λkβˆ’1βˆͺ{Ξ»k}.
  5. Update provisional solution

    xk←argminxβ€–Ξ¦xβˆ’yβ€–22

    subject to supp(x)=Ξ›k.

  6. Update residual

    yk←Φxk;rk←yβˆ’yk.
  7. Go to step 1.

Finalization:

  • x^←xk.

Some remarks are in order

  1. The algorithm returns a K-term approximation of x given by x^.

  2. Each step of algorithm is identified by the iteration counter k which runs from 0 to K.

  3. At each step xk, yk and rk are computed where xk is the k-term estimate of x, yk is corresponding measurement vector and rk is the residual between actual measurement vector y and the estimated measurement vector yk.

  4. The support for x^ is maintained in an index set Ξ›.

  5. At each iteration we add one more new index Ξ»k to Ξ›kβˆ’1 giving us Ξ›k.

  6. We will use ΦΛk∈RMΓ—k to denote the submatrix constructed from the columns indexed by Ξ›k. In other words, if Ξ›k={Ξ»1,…,Ξ»k}, then

    ΦΛk=[ϕλ1…ϕλk].
  7. Similarly we will denote a vector xΞ›kk∈Rk to denote a vector consisting of only k non-zero entries in x.

  8. We note that rk is orthogonal to ΦΛk. This is true due to xk being the least squares solution in the update provisional solution step.

  9. This also ensures that in each iteration a new column from Ξ¦ indexed by Ξ»k will be chosen. OMP will never choose the same column again.

  10. In case x has a sparsity level less than K then rk will become zero in the middle. At that point we halt. There is no point going forward.

  11. An equivalent formulation of the least squares step is

    z←arg minv∈Rk‖ΦΛkvβˆ’yβ€–22

    followed by

    xΞ›kk←z.
  12. We solve the least squares problem for columns of Ξ¦ indexed by Ξ›k and then assign the k entries in the resultant z to the entries in xk indexed by Ξ›k while keeping other entries as 0.

  13. Least squares can be accelerated by using xkβˆ’1 as the starting estimate of xk and carrying out a descent like Richardson’s iteration from there.

15.3.2. Exact Recovery using OMP with Random Sensing MatricesΒΆ

The objective of this subsection is to prove theoretically that OMP can recover sparse signals from a small set of random linear measurements. In this subsection we discuss the conditions on the random sensing matrix Ξ¦ under which they are suitable for signal recovery through OMP.

Definition 15.1 (Admissible sensing matrix)

An admissible sensing matrix for K-sparse signals in RM is an MΓ—N random matrix Ξ¦ with following properties.

  • [M0] Independence: The columns of Ξ¦ are stochastically independent.

  • [M1] Normalization: E(β€–Ο•jβ€–22)=1 for j=1,…,N.

  • [M2] Joint correlation:

    • Let {uk} be a sequence of K vectors whose β„“2 norms do not exceed one.

    • Let Ο• be a column of Ξ¦ that is independent from this sequence.

    • Then

      P(maxk|βŸ¨Ο•,uk⟩|≀ϡ)β‰₯1βˆ’2Kexp⁑(βˆ’cΟ΅2M).
  • [M3] Smallest singular value: Given any MΓ—K (K<M) submatrix Z from Ξ¦, the smallest (K-th largest) singular value Οƒmin(Z) satisfies

    P(Οƒmin(Z)β‰₯0.5)β‰₯1βˆ’exp⁑(βˆ’cM).

c>0 is some positive constant.

It can be shown Rademacher sensing matrices (Rademacher Sensing Matrices) and Gaussian sensing matrices (Gaussian Sensing Matrices) satisfy all the requirements of admissible sensing matrices for sparse recovery using OMP. Some of the proofs are included in the book. You may want to review corresponding sections.

Some remarks are in order to further explain the definition of admissible sensing matrices.

  1. Typically all the columns of a sensing matrix are drawn from the same distribution. But (M0) doesn’t require so. It allows different columns of Ξ¦ to be drawn from different distributions.

  2. The joint correlation property (M2) depends on the decay of random variables β€–Ο•jβ€–2. i.e. it needs the tails of β€–Ο•jβ€–2 to be small.

  3. A bound on the smallest (non-zero) singular value of MΓ—K-sub-matrices (M3) controls how much the sensing matrix can shrink K-sparse vectors.

  4. I guess that the idea of admissible matrices came as follows. First OMP signal recovery guarantees were developed for Gaussian and Rademacher sensing matrices. Then the proofs were analyzed to identify the minimum requirements they imposed on the structure of random sensing matrices. This was extracted in the form of notion of admissible matrices. Finally the proof was reorganized to work for all random matrices which satisfy the admissibility criteria. It is important to understand this process of abstraction otherwise we just get surprised as to how the ideas like admissible matrices came out of the blue.

15.3.3. Signal Recovery Guarantees with OMPΒΆ

We now show that OMP can be used to recover the original signal with high probability if the random measurements are taken using an admissible sensing matrix as described above.

Here we consider the case where x is known to be K-sparse.

Theorem 15.2

Fix some δ∈(0,1), and choose Mβ‰₯CKln⁑(NΞ΄) where C is an absolute constant. Suppose that x is an arbitrary K-sparse signal in RN and draw an MΓ—N admissible sensing matrix Ξ¦ independent from x.

Given the measurement vector y=Ξ¦x∈RM, Orthogonal Matching Pursuit can reconstruct the signal with probability exceeding 1βˆ’Ξ΄.

Some remarks are in order. Specifically we compare OMP here with basis pursuit (BP).

  1. The theorem provides probabilistic guarantees.

  2. The theorem actually requires more measurements than the results for BP.

  3. The biggest advantage is that OMP is a much simpler algorithm than BP and works very fast.

  4. Results for BP show that a single random sensing matrix can be used for recovering all sparse signals.

  5. This theorem says that any sparse signal independent from the sensing matrix can be recovered.

  6. Thus this theorem is weaker than the results for BP. It can be argued that for practical situations, this limitation doesn’t matter much.

Proof. The main challenge here is to handle the issues that arise due to random nature of Ξ¦. We start with setting up some notation for this proof.

  1. We note that the columns that OMP chooses do not depend on the order in which they are stacked in Ξ¦.

  2. Thus without loss of generality we can assume that the first K entries of x are non-zero and rest are zero.

  3. If OMP picks up the first K columns, then OMP has succeeded otherwise it has failed.

  4. With this, support of x given by Ξ›opt={1,…,K}.

We now partition the sensing matrix Ξ¦ as

Ξ¦=[Ξ¦opt|Ξ¨]

where Ξ¦opt consists of first K columns of Ξ¦ which correspond to Ξ›opt. Ξ¨ consists of remaining (Nβˆ’K) columns of Ξ¦.

We recall from the proof of Theorem 14.17 that in order for OMP to make absolute progress at step k+1 we require the greedy selection ratio ρ(rk)<1 where

ρ(rk)=β€–Ξ¨Hrkβ€–βˆžβ€–Ξ¦optHrkβ€–βˆž=maxψ∈Ψ|⟨ψ,rk⟩|β€–Ξ¦optHrkβ€–βˆž.

The proof is organized as follows:

  1. We first construct a thought experiment in which Ξ¨ is not present and OMP is run only with y and Ξ¦opt.

  2. We then run OMP with Ψ present under the condition ρ(rk)<1.

  3. We show that the sequence of columns chosen and residual obtained in both cases is exactly the same.

  4. We show that the residuals obtained in the thought experiment are stochastically independent from the columns of Ξ¨.

  5. We then describe the success of OMP as an event in terms of these residuals.

  6. We compute a lower bound on the probability of the event of OMP success.

For a moment suppose that there was no Ξ¨ and OMP is run with y and Ξ¦opt as input for K iterations. Naturally OMP will choose K columns in Ξ¦opt one by one.

  1. Let the columns it picks up in each step be indexed by Ο‰1,Ο‰2,…,Ο‰K.

  2. Let the residuals before each step be q0,q1,q2,…,qKβˆ’1.

  3. Since x∈C(Φopt), hence the residual after K iterations qK=0.

  4. Since OMP is a deterministic algorithm, hence the two sequences are simply functions of x and Ξ¦opt.

  5. Clearly, we can say that the residual qk are stochastically independent of the columns in Ξ¨ (since columns of Ξ¨ are independent of the columns of Ξ¦opt).

  6. We also know that qk∈C(Φopt).

In this thought experiment we made no assumptions about ρ(qk) since Ψ is not present. We now consider the full matrix Φ and execute OMP with y.

  1. The actual sequence of residuals before each step is r0,r1,…,rKβˆ’1.

  2. The actual sequence of column indices is Ξ»1,…,Ξ»K.

  3. OMP succeeds in recovering x in K steps if and only if it selects the first K columns of Ξ¦ in some order.

  4. This can happen if and only if ρ(rk)<1 holds.

  5. We are going to show inductively that this can happen if and only if Ξ»k=Ο‰k and qk=rk.

  6. At the beginning of step 1, we have r0=q0=y.

  7. Now OMP selects one column from Φopt if and only if ρ(r0)<1 which is identical to ρ(q0)<1.

  8. So it remains to show at step 1 that Ξ»1=Ο‰1.

  9. Because ρ(r0)<1, the algorithm selects the index λ1 of the column from Φopt whose inner product with r0 is the largest (in absolute value).

  10. Also since ρ(q0)<1 with r0=q0, Ο‰1 is the index of column in Ξ¦opt whose inner product with q0 is largest.

  11. Thus Ο‰1=Ξ»1.

  12. We now assume that for the first k iterations, real OMP chooses the same columns as our imaginary thought experiment.

  13. Thus we have

    Ξ»j=Ο‰jβˆ€1≀j≀k

    and

    rj=qjβˆ€0≀j≀k.

    This is valid since the residuals at each step depend solely on the set of columns chosen so far and input y which are same for both cases.

  14. OMP chooses a column in Φopt at (k+1)-th step if and only if ρ(rk)<1 which is same as ρ(qk)<1.

  15. Moreover since rk=qk hence the column chosen by maximizing the inner product is same for both situations.

  16. Thus

    Ξ»k+1=Ο‰k+1.
  17. Therefore the criteria for success of OMP can be stated as ρ(qk)<1 for all 0≀k≀Kβˆ’1.

We now recall that qk is actually a random variable (depending upon the random vectors which comprise the columns of Ξ¦opt).

  1. Thus the event on which the algorithm succeeds in sparse recovery of x from y is given by

    Esuccβ‰œ{max0≀k<Kρ(qk)<1}.
  2. In a particular instance of OMP execution if the event Esucc happens, then OMP successfully recovers x from y.

  3. Otherwise OMP fails.

  4. Hence the probability of success of OMP is same as the probability of event Esucc.

  5. We will be looking for some sort of a lower bound on P(Esucc).

  6. We note that we have {qk} as a sequence of random vectors in the column span of Ξ¦opt and they are stochastically independent from columns of Ξ¨.

  7. It is difficult to compute P(Esucc) directly.

  8. We consider another event

    Ξ“={Οƒmin(Ξ¦opt)β‰₯0.5}.
  9. Clearly

    P(Esucc)β‰₯P(max0≀k<Kρ(qk)<1 and Ξ“).
  10. Using conditional probability we can rewrite

    P(Esucc)β‰₯P(max0≀k<Kρ(qk)<1|Ξ“)P(Ξ“).
  11. Since Ξ¦ is an admissible matrix hence it satisfies (M3) which gives us

    P(Ξ“)β‰₯1βˆ’exp⁑(βˆ’cM).
  12. We just need a lower bound on the conditional probability.

  13. We assume that Ξ“ occurs.

  14. For each step index k=0,1,…,Kβˆ’1, we have

    ρ(qk)=maxψ|⟨ψ,qk⟩|β€–Ξ¦optHqkβ€–βˆž.
  15. Since ΦoptHqk∈RK, we have

    Kβ€–Ξ¦optHqkβ€–βˆžβ‰₯β€–Ξ¦optHqkβ€–2.
  16. This gives us

    ρ(qk)≀Kmaxψ|⟨ψ,qk⟩|β€–Ξ¦optHqkβ€–2.
  17. To simplify this expression, we define a vector

    ukβ‰œ0.5qkβ€–Ξ¦optHqkβ€–2.
  18. This lets us write

    ρ(qk)≀2Kmaxψ|⟨ψ,uk⟩|.
  19. Thus

    P(ρ(qk)<1|Ξ“)β‰₯P(2Kmaxψ|⟨ψ,uk⟩|<1|Ξ“)=P(maxψ|⟨ψ,uk⟩|<12K|Ξ“).
  20. From the basic properties of singular values we recall that

    β€–Ξ¦optHqβ€–2β€–qβ€–2β‰₯Οƒmin(Ξ¦opt)β‰₯0.5

    for all vectors q in the range of Ξ¦opt.

  21. This gives us

    0.5β€–qβ€–2β€–Ξ¦optHqβ€–2≀1.
  22. Since qk is in the column space of Ξ¦opt, for uk defined above we have

    β€–ukβ€–2≀1.

    This is valid under the assumption that the event Ξ“ has happened.

  23. From the above we get

    P(maxkρ(qk)<1|Ξ“)β‰₯P(maxkmaxψ|⟨ψ,uk⟩|<12K|Ξ“)
  24. In the R.H.S. we can exchange the order of two maxima. This gives us

    P(maxkmaxψ|⟨ψ,uk⟩|<12K|Ξ“)=P(maxψmaxk|⟨ψ,uk⟩|<12K|Ξ“).
  25. We also note that columns of Ξ¨ are independent.

  26. Thus in above we require that for each column of Ψ maxk|⟨ψ,uk⟩|<12K should hold independently.

  27. Hence we can say

    P(maxψmaxk|⟨ψ,uk⟩|<12K|Ξ“)=∏ψP(maxk|⟨ψ,uk⟩|<12K|Ξ“).
  28. We recall that event Ξ“ depends only on columns of Ξ¦opt.

  29. Hence columns of Ξ¨ are independent of Ξ“.

  30. Thus

    ∏ψP(maxk|⟨ψ,uk⟩|<12K|Ξ“)=∏ψP(maxk|⟨ψ,uk⟩|<12K).
  31. Since the sequence {uk} depends only on columns of Ξ¦opt, hence columns of Ξ¨ are independent of {uk}.

  32. Thus we can take help of (M2) to get

    ∏ψP(maxk|⟨ψ,uk⟩|<12K)β‰₯(1βˆ’2Kexp⁑(βˆ’cM4K))Nβˆ’K.
  33. This gives us the lower bound

    P(maxkρ(qk)<1|Ξ“)β‰₯(1βˆ’2Kexp⁑(βˆ’cM4K))Nβˆ’K.
  34. Finally plugging in the lower bound for P(Ξ“) we get

    (15.6)ΒΆP(Esucc)β‰₯(1βˆ’2Kexp⁑(βˆ’cM4K))Nβˆ’K(1βˆ’exp⁑(βˆ’cM)).

    All that is remaining now is to simplify this expression.

  35. We recall that we assumed in the theorem statement

    Mβ‰₯CKln⁑(NΞ΄)⟹MCKβ‰₯ln⁑(NΞ΄)⟹exp⁑(MCK)β‰₯Nδ⟹δNβ‰₯exp⁑(βˆ’MCK)⟹δβ‰₯Nexp⁑(βˆ’MCK).
  36. But we assumed that 0<Ξ΄<1.

  37. Thus

    Nexp⁑(βˆ’MCK)<1.
  38. If we choose Cβ‰₯4c then

    (15.7)ΒΆβˆ’1Cβ‰₯βˆ’c4βŸΉβˆ’MCKβ‰₯βˆ’cM4K⟹exp⁑(βˆ’MCK)β‰₯exp⁑(βˆ’cM4K)⟹Nexp⁑(βˆ’MCK)β‰₯2Kexp⁑(βˆ’cM4K)⟹1>Ξ΄β‰₯2Kexp⁑(βˆ’cM4K)

    where we assumed that N≫2K.

  39. We recall that

    (1βˆ’x)kβ‰₯1βˆ’kx if kβ‰₯1 and x≀1.
  40. Applying on (15.6) we get

    (15.8)ΒΆP(Esucc)β‰₯1βˆ’2K(Nβˆ’K)exp⁑(βˆ’cM4K)βˆ’exp⁑(βˆ’cM).

    We ignored the 4-th term in this expansion.

  41. Now we can safely assume that K(Nβˆ’K)β‰₯N24 giving us

    P(Esucc)β‰₯1βˆ’N22exp⁑(βˆ’cM4K)βˆ’exp⁑(βˆ’cM).
  42. If we choose Cβ‰₯8c then following (15.7) we can get

    Nexp⁑(βˆ’MCK)β‰₯Nexp⁑(βˆ’cM8K)⟹δβ‰₯Nexp⁑(βˆ’cM8K)⟹δ2β‰₯N2exp⁑(βˆ’cM4K)⟹1βˆ’Ξ΄22≀1βˆ’N22exp⁑(βˆ’cM4K).
  43. Thus

    P(Esucc)β‰₯1βˆ’Ξ΄22βˆ’exp⁑(βˆ’cM).
  44. Some further simplification can give us

    P(Esucc)β‰₯1βˆ’Ξ΄.
  45. Thus with a suitable choice of the constant C, a choice of Mβ‰₯CKln⁑(NΞ΄) with δ∈(0,1) is sufficient to reduce the failure probability below Ξ΄.

15.3.4. Analysis of OMP using Restricted Isometry PropertyΒΆ

In this subsection we present an alternative analysis of OMP algorithm using the Restricted Isometry Property of the matrix Ξ¦ [17].

15.3.4.1. A re-look at the OMP AlgorithmΒΆ

Before we get into the RIP based analysis of OMP, it would be useful to get some new insights into the behavior of OMP algorithm. These insights will help us a lot in performing the analysis later.

  1. We will assume throughout that whenever |Ξ›|≀K, then ΦΛ is full rank.

  2. The pseudo-inverse is given by

    ΦΛ†=(ΦΛHΦΛ)βˆ’1ΦΛH.
  3. The orthogonal projection operator to the column space for ΦΛ is given by

    PΞ›=ΦΛΦΛ†.
  4. The orthogonal projection operator onto the orthogonal complement of C(ΦΛ) (column space of ΦΛ) is given by

    PΞ›βŠ₯=Iβˆ’PΞ›.
  5. Both PΞ› and PΞ›βŠ₯ satisfy the standard properties like P=PH and P2=P.

  6. We further define

    ΨΛ=PΞ›βŠ₯Ξ¦.
  7. We are orthogonalizing the atoms in Ξ¦ against C(ΦΛ), i.e. taking the component of the atom which is orthogonal to the column space of ΦΛ.

  8. The atoms in ΨΛ corresponding to the index set Ξ› would be 0.

  9. We will make some further observations on the behavior of OMP algorithm [17].

  10. Recall that the approximation after the k-th iteration is given by

    xΞ›kk=ΦΛk†y and xΞ›kck=0.
  11. The residual after k-th iteration is given by

    rk=yβˆ’Ξ¦xk

    and by construction rk is orthogonal to ΦΛk.

  12. We can write

    Ξ¦xk=ΦΛxΞ›kk+ΦΛkcxΞ›ck=ΦΛkxΞ›kk.
  13. Thus,

    rk=yβˆ’Ξ¦Ξ›kxΞ›kk=yβˆ’Ξ¦Ξ›kΦΛk†y=(Iβˆ’PΞ›k)y=PΞ›kβŠ₯y.
  14. In summary

    rk=PΞ›kβŠ₯y.
  15. This shows that it is not actually necessary to compute xk in order to find rk. An equivalent way of writing OMP algorithm could be as in Algorithm 15.2.

Algorithm 15.2 (Sketch of OMP without intermediate xk computation)

Algorithm:

  1. If halting criteria is satisfied, then break.

  2. hk+1←ΦHrk # Match

  3. Ξ»k+1←argmaxjβˆ‰Ξ›k|hjk+1| # Identify

  4. Ξ›k+1←Λkβˆͺ{Ξ»k+1} # Update support

  5. rk+1←PΞ›k+1βŠ₯y # Update residual

  6. k←k+1.

  7. Go back to step 1.

Finalization:

  1. x^Ξ›k←ΦΛk†y.

  2. x^Ξ›kc←0.

  1. In the matching step, we are correlating rk with columns of Ξ¦.

  2. Since rk is orthogonal to column space of ΦΛk, hence this correlation is identical to correlating rk with ΨΛk.

  3. To see this, observe that

    rk=PΞ›kβŠ₯y=PΞ›kβŠ₯PΞ›kβŠ₯y=(PΞ›kβŠ₯)HPΞ›kβŠ₯y.
  4. Thus,

    hk+1=Ξ¦Hrk=Ξ¦H(PΞ›kβŠ₯)HPΞ›kβŠ₯y=(PΞ›kβŠ₯Ξ¦)HPΞ›kβŠ₯y=(ΨΛk)Hrk.
  5. On similar lines, we can also see that

    hk+1=Ξ¦Hrk=Ξ¦HPΞ›kβŠ₯y=Ξ¦H(PΞ›kβŠ₯)Hy=(ΨΛk)Hy.
  6. In other words, we have

    (15.9)ΒΆhk+1=(ΨΛk)Hrk=(ΨΛk)Hy.
  7. Thus, we can observe that OMP can be further simplified and we don’t even need to compute rk in order to compute hk+1.

  8. There is one catch though. If the halting criterion depends on the need to compute the residual energy, then we certainly need to compute rk. If the halting criteria is simply the number of K iterations, then we don’t need to compute rk.

  9. The revised OMP algorithm sketch is presented in Algorithm 15.3.

Algorithm 15.3 (Sketch of OMP without intermediate xk computation)

Algorithm:

  1. If halting criteria is satisfied, then break.

  2. hk+1←(ΨΛk)Hy # Match

  3. Ξ»k+1←argmaxiβˆ‰Ξ›k|hik+1| # Identify

  4. Ξ›k+1←Λkβˆͺ{Ξ»k+1} # Update support

  5. k←k+1

  6. Go back to step 1.

Finalization:

  1. x^Ξ›k←ΦΛk†y.

  2. x^Ξ›kc←0.

With this the OMP algorithm is considerably simplified from the perspective of analyzing its recovery guarantees.

  1. Coming back to hk+1, note that the columns of ΨΛk indexed by Ξ›k are all 0s.

  2. Thus

    hjk+1=0βˆ€jβˆˆΞ›k.
  3. This makes it obvious that Ξ»k+1βˆ‰Ξ› and consequently |Ξ›k|=k (inductively).

  4. Lastly for the case of noise free model y=Ξ¦x, we may write

    rk=PΞ›kβŠ₯y=PΞ›kβŠ₯Ξ¦x=ΨΛkx.
  5. Since columns of ΨΛk indexed by Ξ›k are 0, hence when supp(x)βŠ†Ξ›k, then rk=0.

  6. In this case xk=x exactly since it is a least squares estimate over ΦΛk.

  7. For the same reason, if we construct a vector x~k by zeroing out the entries indexed by Ξ›k i.e.

    (15.10)ΒΆx~Ξ›kk=0 and x~Ξ›kc=xΞ›kc

    then

    (15.11)ΒΆrk=ΨΛkx~k.
  8. If β€–xβ€–0=K, then β€–x~kβ€–0=Kβˆ’k.

  9. Lastly putting rk back in (15.9), we obtain

    (15.12)ΒΆhk+1=(ΨΛk)HΨΛkx~k.
  10. In this version, we see that hk+1 is computed by applying the matrix (ΨΛk)HΨΛk to the (Kβˆ’k) sparse vector x~k.

  11. We are now ready to carry out RIP based analysis of OMP.

15.3.4.2. RIP based Analysis of OMPΒΆ

Our analysis here will focus on the case for real signals and real matrices i.e. Φ∈RMΓ—N and x∈RN. We will attack the noise free case.

Some results for matrices that satisfy RIP will be useful in the upcoming analysis. Please refer to Restricted Isometry Property for an extensive treatment of RIP based results.

Theorem 12.63 applies to approximate preservation of the inner product of sparse signals u,v∈RN.

Let u,v∈RN and Kβ‰₯max(β€–u+vβ€–0,β€–uβˆ’vβ€–0). Then

(15.13)ΒΆ|⟨Φu,Ξ¦vβŸ©βˆ’βŸ¨u,v⟩|≀δKβ€–uβ€–2β€–vβ€–2.

Theorem 12.65 shows that the matrix ΨΛ also satisfies a modified version of RIP. Let |Ξ›|<K. Then

(15.14)ΒΆ(1βˆ’Ξ΄K1βˆ’Ξ΄K)β€–yβ€–22≀‖ΨΛyβ€–22≀(1+Ξ΄K)β€–yβ€–22

whenever β€–yβ€–0≀Kβˆ’|Ξ›| and supp(y)βˆ©Ξ›=βˆ….

If Ξ¦ satisfies RIP of order K, then ΨΛ acts as an approximate isometry on every (Kβˆ’|Ξ›|)-sparse vector supported on Ξ›c.

From (15.11) recall that the residual vector rk is formed by applying ΨΛk to x~k which is a Kβˆ’k sparse vector supported on Ξ›kc.

Our interest is in combining above two results and get some bound on the inner products hjk+1. Exactly what kind of bound? When Ξ›k has been identified, our interest is in ensuring that the next index is chosen from the set supp(x)βˆ–Ξ›k. A useful way to ensure this would be to verify if the entries in hk+1 are close to x~k. If they are, then they would be 0 over Ξ›k , they would be pretty high over supp(x)βˆ–Ξ›k and lastly, very small over supp(x)c which is what we want.

The next result develops these bounds around (15.12).

Lemma 15.1

Let Ξ›βŠ‚{1,…,N} and suppose x~∈RN with supp(x~)βˆ©Ξ›=βˆ…. Define

(15.15)ΒΆh=ΨΛTΨΛx~.

Then if Ξ¦ satisfies the RIP of order Kβ‰₯β€–x~β€–0+|Ξ›|+1 with isometry constant Ξ΄K, we have

(15.16)ΒΆ|hjβˆ’x~j|≀δK1βˆ’Ξ΄Kβ€–x~β€–2βˆ€jβˆ‰Ξ›.

Note that |Ξ›| is the number of entries in the the discovered part of the support at any iteration in OMP and β€–x~β€–0 is the number of entries in not yet discovered part of the support.

Proof. .

  1. We have |Ξ›|<K and β€–x~β€–0<Kβˆ’|Ξ›|.

  2. Thus, from (15.14), we obtain

    (1βˆ’Ξ΄K1βˆ’Ξ΄K)β€–x~β€–22≀‖ΨΛx~β€–22≀(1+Ξ΄K)β€–x~β€–22.
  3. We can make a statement saying ΨΛ satisfies a RIP of order

    (β€–x~β€–0+|Ξ›|+1)βˆ’|Ξ›|=β€–x~β€–0+1

    with a RIP constant Ξ΄K1βˆ’Ξ΄K.

  4. By the definition of h, we have

    hj=βŸ¨Ξ¨Ξ›x~,ΨΛej⟩

    where hj is the j-th entry in h and ej denotes the j-th vector from the identity basis.

  5. We already know that hj=0 for all jβˆˆΞ›.

  6. Consider jβˆ‰Ξ› and take the two vectors x~ and ej.

  7. We can see that

    β€–x~Β±ejβ€–0≀‖x~β€–0+1

    and

    supp(x~Β±ej)βˆ©Ξ›=βˆ….
  8. Applying (15.13) on the two vectors with ΨΛ as our RIP matrix, we see that

    |βŸ¨Ξ¨Ξ›x~,ΨΛejβŸ©βˆ’βŸ¨x~,ej⟩|≀δK1βˆ’Ξ΄Kβ€–x~β€–2β€–ejβ€–2.
  9. But

    |βŸ¨Ξ¨Ξ›x~,ΨΛejβŸ©βˆ’βŸ¨x~,ej⟩|=|hjβˆ’x~j|.
  10. Noting that β€–ejβ€–2=1, we get our desired result.

With this bound in place, we can develop a sufficient condition under which the identification step of OMP (which identifies the new index Ξ»k+1) will succeed.

The following corollary establishes a lower bound on the largest entry in x~ which will ensure that OMP indeed chooses the next index Ξ»k from the support of x~.

Corollary 15.1

Suppose that Ξ›, Ξ¦, x~ meet the assumptions in Lemma 15.1, and let h be as defined in (15.15). If

(15.17)ΒΆβ€–x~β€–βˆž>2Ξ΄K1βˆ’Ξ΄Kβ€–x~β€–2,

we are guaranteed that

argmaxjβˆ‰Ξ›|hj|∈supp(x~).

Proof. .

  1. If (15.16) is satisfied, then for indices jβˆ‰supp(x~), we will have

    |hj|≀δK1βˆ’Ξ΄Kβ€–x~β€–2.
  2. We already know that hj=0 for all jβˆˆΞ›.

  3. If (15.17) is satisfied, then there exists j∈supp(x~) with

    |x~j|>2Ξ΄K1βˆ’Ξ΄Kβ€–x~β€–2.
  4. For this particular j, applying triangular inequality on (15.16)

    Ξ΄K1βˆ’Ξ΄Kβ€–x~β€–2β‰₯|hjβˆ’x~j|β‰₯|x~j|βˆ’|hj|.
  5. Thus

    |hj|β‰₯|x~j|βˆ’Ξ΄K1βˆ’Ξ΄Kβ€–x~β€–2>2Ξ΄K1βˆ’Ξ΄Kβ€–x~β€–2βˆ’Ξ΄K1βˆ’Ξ΄Kβ€–x~β€–2=Ξ΄K1βˆ’Ξ΄Kβ€–x~β€–2.
  6. We have established that there exists some j∈supp(x~) for which

    |hj|>Ξ΄K1βˆ’Ξ΄Kβ€–x~β€–2

    and for every jβˆ‰supp(x~)

    |hj|≀δK1βˆ’Ξ΄Kβ€–x~β€–2.
  7. Together, they establish that OMP will indeed choose an index from the correct set.

All we need to do now is to make sure that (15.17) is satisfied by choosing Ξ΄K small enough. The following result from [17] guarantees that.

Theorem 15.3

Suppose that Ξ¦ satisfies the RIP of order K+1 with isometry constant Ξ΄<12K+1. Then for any x∈RN with β€–xβ€–0≀K, OMP will recover x exactly from y=Ξ¦x in K iterations.

The upper bound on Ξ΄ can be simplified can be simplified as Ξ΄<13K.

Proof. The proof works by induction. We show that under the stated conditions, λ1∈supp(x). Then we show that whenever λk∈supp(x) then λk+1 also ∈supp(x).

  1. For the first iteration, we have

    h1=ΦTΦy.
  2. Note that Ξ¦=Ξ¨βˆ….

  3. It is given that β€–xβ€–0≀K.

  4. Thus due to Theorem 12.16:

    β€–xβ€–βˆžβ‰₯β€–xβ€–2K.
  5. Now Ξ΄<13K or Ξ΄<12K+1 implies that

    (15.18)ΒΆ2Ξ΄1βˆ’Ξ΄<1K.
  6. This can be seen as follows. Assuming Kβ‰₯1, we have:

    3Kβ‰₯2K+1⟹13K≀12K+1⟹δ<12K+1⟹2Ξ΄K+Ξ΄<1⟹2Ξ΄K<1βˆ’Ξ΄βŸΉ2Ξ΄1βˆ’Ξ΄<1K.
  7. Therefore

    β€–xβ€–βˆž>2Ξ΄1βˆ’Ξ΄β€–xβ€–2

    and (15.17) is satisfied and Ξ»1 will indeed be chosen from supp(x) due to Corollary 15.1.

  8. We now assume that OMP has correctly discovered indices up to Ξ»1,…,Ξ»k. i.e.

    Ξ›kβŠ‚supp(x).
  9. We have to show that it will also correctly discover Ξ»k+1.

  10. From the definition of x~ in (15.10), we know that supp(x~k)βˆ©Ξ›k=βˆ….

  11. Thus

    β€–x~kβ€–0≀Kβˆ’k.
  12. We also know that |Ξ›k|=k. By assumption Ξ¦ satisfies RIP of order K+1=(Kβˆ’k)+k+1.

  13. Thus

    K+1β‰₯β€–x~kβ€–0+|Ξ›k|+1.
  14. Also due to Theorem 12.16:

    β€–x~kβ€–βˆžβ‰₯β€–x~kβ€–2Kβˆ’kβ‰₯β€–x~kβ€–2K.
  15. Using (15.18), we get

    β€–x~kβ€–βˆž>2Ξ΄1βˆ’Ξ΄β€–x~kβ€–2.
  16. This is the sufficient condition for Corollary 15.1 in (15.17) giving us

    Ξ»k+1=argmaxjβˆ‰Ξ›k|hjk+1|∈supp(x~k).
  17. Hence Ξ›k+1βŠ†supp(x).