9.2. Projection on Convex SetsΒΆ

We will assume V to be real n-dimensional vector space space with an inner product βŸ¨β‹…,β‹…βŸ© , the induced norm β€–β‹…β€– and corresponding dual norm β€–β‹…β€–βˆ— for the dual space Vβˆ—.

We are interested in mapping a point x to the nearest point in a set C. In general, this problem may have zero, one or multiple solutions. However, in the special case where C is a nonempty, closed and convex set, then there is exactly one such point in C which is nearest to a given point x. This nearest point is called the projection of x on to C.

Main references for this section are [5, 7].

9.2.1. Projection TheoremΒΆ

Theorem 9.6 (Projection theorem)

Let C be a nonempty, closed and convex subset of V. For every x∈V, there exists a unique vector that minimizes β€–zβˆ’xβ€– over all z∈C. This vector is called the projection of x on C.

Proof. We fix some x∈V and choose an element w∈C.

  1. Consider the function

    g(z)=12β€–zβˆ’xβ€–2.
  2. Minimizing β€–zβˆ’xβ€– over all C is equivalent to minimizing g(z) over the set

    D={z∈C|β€–zβˆ’x‖≀‖wβˆ’xβ€–}.
  3. We note that D is a compact set and g is a l.s.c., closed and coercive function.

  4. By Weierstrass’ theorem Corollary 7.2, the set of minimizers for g is nonempty and compact.

We now show that the minimizer is unique.

  1. g is a strictly convex function because its Hessian matrix is identity matrix which is positive definite.

  2. Hence, the minimizer is unique due to Theorem 9.2.

9.2.2. Orthogonal ProjectionΒΆ

Definition 9.6 (Orthogonal Projection Mapping)

Let C be a nonempty, closed and convex subset of V. The orthogonal projection mapping PC:V→V is defined by:

PC(x)β‰œargminy∈Cβ€–yβˆ’xβ€–βˆ€x∈V.

This mapping is well defined since the projection is unique for a nonempty, closed and convex set C due to Theorem 9.6.

The vector PC(x) is called the projection of x on the set C.

9.2.3. CharacterizationΒΆ

Theorem 9.7 (Orthogonal projection characterization)

Let C be a nonempty, closed and convex subset of V. For every vector x∈V, a vector z∈C is its projection if and only if

⟨yβˆ’z,xβˆ’zβŸ©β‰€0βˆ€y∈C.

This result is also known as the second projection theorem [5].

Proof. Assume that for some z∈C, ⟨yβˆ’z,xβˆ’zβŸ©β‰€0βˆ€y∈C holds true.

  1. For any y∈C

    β€–yβˆ’xβ€–2=β€–(yβˆ’z)βˆ’(xβˆ’z)β€–2=β€–yβˆ’zβ€–2+β€–zβˆ’xβ€–2βˆ’2⟨yβˆ’z,xβˆ’z⟩β‰₯β€–zβˆ’xβ€–2βˆ’2⟨yβˆ’z,xβˆ’z⟩.
  2. Thus,

    β€–yβˆ’xβ€–2β‰₯β€–zβˆ’xβ€–2βˆ€y∈C.
  3. Thus, z is indeed the projection of x on C.

Conversely, assume that z is the projection of x on C.

  1. Let y∈C be arbitrary.

  2. For any tβ‰₯0, define yt=ty+(1βˆ’t)z.

  3. Then, we have

    β€–xβˆ’ytβ€–2=β€–tx+(1βˆ’t)x+βˆ’tyβˆ’(1βˆ’t)zβ€–2=β€–t(xβˆ’y)+(1βˆ’t)(xβˆ’z)β€–2=t2β€–xβˆ’yβ€–2+(1βˆ’t)2β€–xβˆ’zβ€–2+2t(1βˆ’t)⟨xβˆ’y,xβˆ’z⟩.
  4. Viewing β€–xβˆ’ytβ€–2 as a function of t and differentiating w.r.t. t, we have

    ddtβ€–xβˆ’ytβ€–2|t=0=βˆ’2β€–xβˆ’zβ€–2+2⟨xβˆ’y,xβˆ’z⟩=βˆ’2⟨xβˆ’z,xβˆ’z⟩+2⟨xβˆ’y,xβˆ’z⟩=βˆ’2⟨yβˆ’z,xβˆ’z⟩.
  5. Since t=0 minimizes β€–xβˆ’ytβ€–2 over t∈[0,1], we must have

    ddtβ€–xβˆ’ytβ€–2|t=0β‰₯0.
  6. Thus, we require that

    ⟨yβˆ’z,xβˆ’zβŸ©β‰€0

    must hold true for every y∈C.

Following is an alternative proof based on results from Constrained Optimization I. This proof is specific to the case where V=Rn.

Proof. Define a function f:Rn→R as

f(y)=β€–yβˆ’xβ€–2.

Then, the projection problem can be cast as an optimization problem

minimize f(y)subject to y∈C.

Note that the gradient of f is given by

βˆ‡f(y)=βˆ‡βŸ¨yβˆ’x,yβˆ’x⟩=βˆ‡(⟨y,yβŸ©βˆ’2⟨y,x⟩+⟨x,x⟩)=2(yβˆ’x).

By Theorem 9.47, z is an optimal solution if and only if

f(z)T(yβˆ’z)β‰₯0βˆ€y∈C.

In other words

2(zβˆ’x)T(yβˆ’z)β‰₯0βˆ€y∈C.

We can simplify this as

⟨xβˆ’z,yβˆ’zβŸ©β‰€0βˆ€y∈C.

Theorem 9.8 (Orthogonal projection on an affine subspace)

Let C be an affine subspace of V. Let S be the linear subspace parallel to C. For every vector x∈V, a vector z∈C is its projection if and only if

xβˆ’z∈SβŠ₯.

Proof. Since C is an affine subspace of V, hence C is nonempty, convex and closed (as V is finite dimensional).

  1. By Theorem 9.7, z is the projection of x on C if and only if for every y∈C, we have

    ⟨yβˆ’z,xβˆ’zβŸ©β‰€0.
  2. But y∈C if and only if yβˆ’z∈S.

  3. Hence the condition is equivalent to

    ⟨w,xβˆ’zβŸ©β‰€0βˆ€w∈S.
  4. But then, it must be an equality since w and βˆ’w both belong to S. Thus, we have

    ⟨w,xβˆ’z⟩=0βˆ€w∈S.
  5. In other words, xβˆ’z∈SβŠ₯.

9.2.4. Distance FunctionΒΆ

Recall that the distance of a point x∈V from a set C is defined as

dC(x)β‰œinfy∈Cβ€–xβˆ’yβ€–.

Theorem 9.9 (Distance function for nonempty, closed and convex set)

Let C be a nonempty, closed and convex subset of V. Then the function dC:Vβ†’R defining the distance of a point x∈V from the set C satisfies

dC(x)=β€–xβˆ’PC(x)β€–.

Proof. By Theorem 9.6, there exists a unique point PC(x) which minimizes the distance between x and C. Hence

dC(x)=β€–xβˆ’PC(x)β€–

must hold true.

Theorem 9.10 (Distance function for nonempty, closed and convex set is convex)

Let C be a nonempty, closed and convex subset of V. Let dC:V→R be the distance to the set C function as defined in Theorem 9.9. Then, dC is convex.

Proof. Assume, for contradiction, that dC is not convex.

  1. Then, there exist x,y∈V and t∈(0,1) such that

    dC(tx+(1βˆ’t)y)>tdC(x)+(1βˆ’t)dC(y).
  2. Let u=PC(x) and v=PC(y). By definition, u,v∈C.

  3. Then,

    tdC(x)+(1βˆ’t)dC(y)=tβ€–uβˆ’xβ€–+(1βˆ’t)β€–vβˆ’yβ€–.
  4. Since C is convex, hence, tu+(1βˆ’t)v∈C.

  5. Since, dC(tx+(1βˆ’t)y) minimizes the distance of C from the point tx+(1βˆ’t)y, hence

    β€–tu+(1βˆ’t)vβˆ’txβˆ’(1βˆ’t)yβ€–β‰₯dC(tx+(1βˆ’t)y).
  6. Rewriting,

    dC(tx+(1βˆ’t)y)≀‖t(uβˆ’x)+(1βˆ’t)(vβˆ’y)‖≀tβ€–uβˆ’xβ€–+(1βˆ’t)β€–vβˆ’yβ€–

    due to triangle inequality.

  7. But, this leads to the contradiction

    tβ€–uβˆ’xβ€–+(1βˆ’t)β€–vβˆ’yβ€–<dC(tx+(1βˆ’t)y)≀tβ€–uβˆ’xβ€–+(1βˆ’t)β€–vβˆ’yβ€–.
  8. Hence, dC must be convex.

9.2.5. NonexpansivenessΒΆ

Definition 9.7 (Nonexpansiveness property)

Let V be a normed linear space. An operator T:V→V is called nonexpansive if

β€–T(x)βˆ’T(y)‖≀‖yβˆ’xβ€–βˆ€x,y∈V.

In other words, the distance between mapped points in V is always less than or equal to the distance between original points in V.

Theorem 9.11 (Nonexpansive operators are Lipschitz continuous)

Let V a be normed linear space. A nonexpansive operator T:V→V is Lipschitz continuous. Hence, it is uniformly continuous.

Proof. Recall from Definition 3.45 that if T is a Lipschitz map, then there exists L>0 such that

β€–T(x)βˆ’T(y)‖≀Lβ€–xβˆ’yβ€–

for every x,y∈V.

For a nonexpansive operator, such L=1. This T is indeed Lipschitz continuous. By Theorem 3.57, every Lipschitz continuous function is uniformly continuous.

Definition 9.8 (Firm nonexpansiveness property)

Let V be a real inner product space. An operator T:V→V is called firmly nonexpansive if

⟨T(x)βˆ’T(y),xβˆ’y⟩β‰₯β€–T(x)βˆ’T(y)β€–2

holds true for every x,y∈V.

Theorem 9.12 (A firmly nonexpansive operator is nonexpansive)

Let V be a real inner product space. Let T:V→V be a firmly nonexpansive operator. Then, T is nonexpansive.

Proof. For every x,y∈V, we have

β€–T(x)βˆ’T(y)β€–2β‰€βŸ¨T(x)βˆ’T(y),xβˆ’y⟩.

Applying Cauchy Schwartz inequality on R.H.S., we get

β€–T(x)βˆ’T(y)β€–2≀‖T(x)βˆ’T(y)β€–β€–xβˆ’yβ€–.

Canceling terms, we get:

β€–T(x)βˆ’T(y)‖≀‖xβˆ’yβ€–

which is the nonexpansive property.

Theorem 9.13 (Orthogonal projection is nonexpansive)

Let C be a nonempty, closed and convex subset of V. Let PC:V→V be the orthogonal projection operator as defined in Definition 9.6 is nonexpansive (and therefore continuous).

In other words,

β€–PC(x)βˆ’PC(y)‖≀‖xβˆ’yβ€–βˆ€x,y∈V.

Proof. Let x,y∈V.

  1. By Theorem 9.7,

    ⟨wβˆ’PC(x),xβˆ’PC(x)βŸ©β‰€0βˆ€w∈C.
  2. In particular PC(y)∈C. Hence,

    ⟨PC(y)βˆ’PC(x),xβˆ’PC(x)βŸ©β‰€0.
  3. Similarly, starting with PC(x), we obtain

    ⟨PC(x)βˆ’PC(y),yβˆ’PC(y)βŸ©β‰€0.
  4. Adding these two inequalities, we obtain

    ⟨PC(y)βˆ’PC(x),xβˆ’PC(x)βˆ’y+PC(y)βŸ©β‰€0.
  5. By rearranging the terms, we get

⟨PC(y)βˆ’PC(x),PC(y)βˆ’PC(x)βŸ©β‰€βŸ¨PC(y)βˆ’PC(x),yβˆ’x⟩.
  1. Applying the Cauchy Schwartz inequality on the R.H.S., we obtain

β€–PC(y)βˆ’PC(x)β€–2≀‖PC(y)βˆ’PC(x)β€–β€–yβˆ’xβ€–.
  1. Thus, PC is nonexpansive.

  2. Since PC is nonexpansive, hence PC is continuous also.

Theorem 9.14 (Orthogonal projection is firmly nonexpansive)

Let C be a nonempty, closed and convex subset of V. Let PC:V→V be the orthogonal projection operator as defined in Definition 9.6. Then PC is a firmly nonexpansive operator.

In other words,

⟨PC(x)βˆ’PC(y),xβˆ’y⟩β‰₯β€–PC(x)βˆ’PC(y)β€–2

holds true for every x,y∈V.

Proof. Recall from Theorem 9.7 that for any u∈V and v∈C

⟨vβˆ’PC(u),uβˆ’PC(u)βŸ©β‰€0.
  1. Substituting u=x and v=PC(y), we obtain

    ⟨PC(y)βˆ’PC(x),xβˆ’PC(x)βŸ©β‰€0.
  2. Substituting u=y and v=PC(x), we obtain

    ⟨PC(x)βˆ’PC(y),yβˆ’PC(y)βŸ©β‰€0.
  3. Adding the two inequalities gives us

    ⟨PC(x)βˆ’PC(y),yβˆ’PC(y)βˆ’x+PC(x)βŸ©β‰€0⟺⟨PC(x)βˆ’PC(y),(yβˆ’x)+(PC(x)βˆ’PC(y))βŸ©β‰€0βŸΊβ€–PC(x)βˆ’PC(y)β€–2β‰€βŸ¨PC(x)βˆ’PC(y),xβˆ’y⟩

    as desired.

9.2.6. Squared Distance FunctionΒΆ

Definition 9.9 (Squared distance function to a nonempty set)

Let C be a nonempty subset of V. The squared distance to set C function denoted as φC:V→R is defined as:

Ο†C(x)β‰œ12dC2(x).

We also define ψC:Vβ†’R as:

ψC(x)β‰œ12(β€–xβ€–2βˆ’dC2(x)).

Theorem 9.15 (Expression for ψC)

Let C be a nonempty subset of V. Then, the function ψC as defined in Definition 9.9 is given by

ψC(x)=supy∈C[⟨y,xβŸ©βˆ’12β€–yβ€–2].

Proof. We proceed as follows.

  1. Expanding on the definition of dC2

    dC2(x)=infy∈Cβ€–xβˆ’yβ€–2=infy∈C⟨xβˆ’y,xβˆ’y⟩=infy∈C(β€–xβ€–2βˆ’2⟨x,y⟩+β€–yβ€–2)=infy∈C(β€–xβ€–2βˆ’(2⟨x,yβŸ©βˆ’β€–yβ€–2))=β€–xβ€–2βˆ’supy∈C(2⟨x,yβŸ©βˆ’β€–yβ€–2).
  2. Thus,

    β€–xβ€–2βˆ’dC2(x)=supy∈C(2⟨x,yβŸ©βˆ’β€–yβ€–2).
  3. Thus,

    ψC(x)=12(β€–xβ€–2βˆ’dC2(x))=supy∈C[⟨x,yβŸ©βˆ’12β€–yβ€–2].

Theorem 9.16 (ψC is convex)

Let C be a nonempty subset of V. Then, the function ψC as defined in Definition 9.9 is convex.

Beauty of this result is the fact that ψC is convex irrespective of whether C is convex or not.

Proof. We proceed as follows.

  1. For every y∈C, the function gy:Vβ†’R, given by

    gy(x)=⟨y,xβŸ©βˆ’12β€–yβ€–2

    is an affine function.

  2. gy is convex for every y∈C due to Theorem 8.68.

  3. Now,

    ψC(y)=supy∈Cgy.
  4. Thus, ψC is a pointwise supremum of convex functions.

  5. Thus, by Theorem 8.114, ψC is convex.

Theorem 9.17 (Squared distance function for nonempty, closed and convex sets)

Let C be a nonempty, closed and convex subset of V. Then, the squared distance function Ο†C is given by

Ο†C(x)=12β€–xβˆ’PC(x)β€–2.

This follows directly from Theorem 9.10.

9.2.7. Gradients and SubgradientsΒΆ

Theorem 9.18 (Gradient of the squared distance function)

Let C be a nonempty, closed and convex subset of V. The gradient of the squared distance function Ο†C as defined in Definition 9.9 at x∈V is given by:

βˆ‡Ο†C(x)=xβˆ’PC(x)βˆ€x∈V.

Proof. We proceed as follows.

  1. Let x∈V.

  2. Let zx=xβˆ’PC(x).

  3. Consider the function

    gx(d)=Ο†C(x+d)βˆ’Ο†C(x)βˆ’βŸ¨d,zx⟩.
  4. If

    limd→0gx(d)‖d‖=0

    then zx is indeed the gradient of Ο†C at x.

  5. By definition of orthogonal projection, for any d∈V,

    β€–x+dβˆ’PC(x+d)β€–2≀‖x+dβˆ’PC(x)β€–2

    as PC(x+d) is the nearest point to x+d in C. PC(x) is just another point in C.

  6. Thus, for any d∈V

    gx(d)=Ο†C(x+d)βˆ’Ο†C(x)βˆ’βŸ¨d,zx⟩=12β€–x+dβˆ’PC(x+d)β€–2βˆ’12β€–xβˆ’PC(x)β€–2βˆ’βŸ¨d,zxβŸ©β‰€12β€–x+dβˆ’PC(x)β€–2βˆ’12β€–xβˆ’PC(x)β€–2βˆ’βŸ¨d,zx⟩.
  7. Recall that for a norm induced by the inner product

    β€–a+bβ€–2=⟨a+b,a+b⟩=β€–aβ€–2+2⟨a,b⟩+β€–bβ€–2.
  8. Thus,

    β€–x+dβˆ’PC(x)β€–2=β€–d+(xβˆ’PC(x))β€–2=β€–dβ€–2+β€–xβˆ’PC(x)β€–2+2⟨d,xβˆ’PC(x)⟩.
  9. Putting it back and simplifying, we obtain

    gx(d)≀12β€–dβ€–2+⟨d,xβˆ’PC(x)βŸ©βˆ’βŸ¨d,zx⟩=12β€–dβ€–2.
  10. Proceeding similarly, we also have

    gx(βˆ’d)≀12β€–dβ€–2.
  11. Since Ο†C is convex, hence gx is also convex.

  12. Thus,

    0=gx(0)=gx(12d+12(βˆ’d))≀12gx(d)+12gx(βˆ’d).
  13. Thus,

    gx(d)β‰₯βˆ’gx(βˆ’d)β‰₯βˆ’12β€–dβ€–2.
  14. Combining, we have

    βˆ’12β€–dβ€–2≀gx(d)≀12β€–dβ€–2.
  15. Or, in terms of absolute values.

    |gx(d)|≀12β€–dβ€–2.
  16. Then,

    |gx(d)|β€–d‖≀12β€–dβ€–.
  17. Thus,

    limd→0gx(d)‖d‖=0
  18. Thus, zx=xβˆ’PC(x) is indeed the gradient of Ο†C at x.

Theorem 9.19 (Gradient of ψC.)

Let C be a nonempty, closed and convex subset of V. The gradient of the function ψC as defined in Definition 9.9 at x∈V is given by:

βˆ‡ΟˆC(x)=PC(x)βˆ€x∈V.

Proof. We have

ψC(x)=12(β€–xβ€–2βˆ’dC2(x))=12(β€–xβ€–2βˆ’Ο†C(x).

Hence,

βˆ‡ΟˆC(x)=xβˆ’βˆ‡Ο†C(x)=xβˆ’(xβˆ’PC(x))=PC(x).

Remark 9.5 (Distance function and square distance function relation)

We note that Ο†C=g∘dC where g(t)=12[t]+2.

g is a nonincreasing real-valued convex differentiable function. We also note that

gβ€²(t)=2[t]+.

Theorem 9.20 (Subdifferential of the distance function)

Let C be a nonempty, closed and convex subset of V. The subdifferential of the distance function dC is given by

βˆ‚dC(x)={{xβˆ’PC(x)dC(x)},xβˆ‰CNC(x)∩B[0,1],x∈C.

NC(x) denotes the normal cone of all vectors normal to the set C at a point x∈C.

Since βˆ‚dC is a singleton for xβˆ‰C, hence dC is differentiable at xβˆ‰C.

Proof. We can get the subdifferentials for dC by applying the chain rule.

  1. Recall that Ο†C=g∘dC where g(t)=12[t]+2.

  2. Thus, by subdifferential chain rule (Theorem 8.229):

    βˆ‚Ο†C(x)=gβ€²(dC(x))βˆ‚dC(x)=[dC(x)]+βˆ‚dC(x)=dC(x)βˆ‚dC(x).

    We used the fact that dC is nonnegative.

  3. Since Ο†C is differentiable, hence βˆ‚Ο†C(x)={xβˆ’PC(x)}.

  4. If xβˆ‰C, then, dC(x)>0.

  5. Thus, for xβˆ‰C

    βˆ‚dC(x)={xβˆ’PC(x)dC(x)}.
  6. For x∈C, dC(x)=0.

  7. We need to show that βˆ‚dC(x)=NC(x)∩B[0,1] in this case.

  8. Consider any dβˆˆβˆ‚dC(x).

  9. Then, by subgradient inequality

    dC(y)β‰₯dC(x)+⟨yβˆ’x,d⟩=⟨yβˆ’x,dβŸ©βˆ€y∈V

    since dC(x)=0.

  10. Then, in particular, for any y∈C

    dC(y)=0β‰₯⟨yβˆ’x,d⟩.
  11. Thus, d∈NC(x).

9.2.8. ConjugatesΒΆ

Theorem 9.21 (Conjugate of norm squared plus indicator)

Let C be a nonempty subset of V. Let f:Vβ†’(βˆ’βˆž,∞] be given by:

f(x)=12β€–xβ€–2+Ξ΄C(x).

Then, its conjugate is given by:

fβˆ—(y)=12β€–yβ€–2βˆ’12dC2(y)=ψC(y).

Further, if C is nonempty, closed and convex, then fβˆ—βˆ—=f. In other words, ψCβˆ—=f.

Proof. Recall from Definition 8.78 that

fβˆ—(y)=supx∈V{⟨x,yβŸ©βˆ’f(x)}βˆ€y∈Vβˆ—.

Expanding, we obtain

supx∈V{⟨x,yβŸ©βˆ’f(x)}=supx∈V{⟨x,yβŸ©βˆ’12β€–xβ€–2βˆ’Ξ΄C(x)}=supx∈C{⟨x,yβŸ©βˆ’12β€–xβ€–2}=ψC(y).

The last result is due to Theorem 9.15.

If C is nonempty, closed and convex, then, f is proper closed and convex. Then, due to Theorem 8.242, the biconjugate of f is f itself.

9.2.9. SmoothnessΒΆ

Recall from Definition 8.80 that a function f:V→R is L-smooth if

β€–βˆ‡f(x)βˆ’βˆ‡f(y)β€–βˆ—β‰€Lβ€–xβˆ’yβ€–βˆ€x,y∈V.

Since the norm in this section is induced by the inner product, hence it is self dual.

Thus, the smoothness criteria becomes:

β€–βˆ‡f(x)βˆ’βˆ‡f(y)β€–βˆ—β‰€Lβ€–xβˆ’yβ€–βˆ€x,y∈V.

Theorem 9.22 (Smoothness of the squared distance function)

The squared distance function Ο†C is 1-smooth.

Proof. Recall the definition of Ο†C from Definition 9.9.

  1. By Theorem 9.18,

    βˆ‡Ο†C(x)=xβˆ’PC(x).
  2. Hence,

    β€–βˆ‡Ο†C(x)βˆ’βˆ‡Ο†C(y)β€–2=β€–xβˆ’PC(x)βˆ’y+PC(y)β€–2=β€–(xβˆ’y)βˆ’(PC(x)βˆ’PC(y))β€–2=β€–xβˆ’yβ€–2βˆ’2⟨PC(x)βˆ’PC(y),xβˆ’y⟩+β€–PC(x)βˆ’PC(y)β€–2≀‖xβˆ’yβ€–2βˆ’2β€–PC(x)βˆ’PC(y)β€–2+β€–PC(x)βˆ’PC(y)β€–2 firm nonexpansiveness =β€–xβˆ’yβ€–2βˆ’β€–PC(x)βˆ’PC(y)β€–2≀‖xβˆ’yβ€–2.
  3. Thus,

    β€–βˆ‡Ο†C(x)βˆ’βˆ‡Ο†C(y)‖≀‖xβˆ’yβ€–.
  4. Thus, Ο†C is 1-smooth.

Theorem 9.23 (Smoothness of the ψC function)

The function ψC is 1-smooth.

Proof. We recall from Theorem 9.19 that βˆ‡ΟˆC(x)=PC(x).

Hence,

β€–βˆ‡ΟˆC(x)βˆ’βˆ‡ΟˆC(y)β€–=β€–PC(x)βˆ’PC(y)‖≀‖xβˆ’yβ€–

due to the nonexpansiveness property (Theorem 9.13).

9.2.10. POCS ProblemsΒΆ

In this section, we present some example optimization problems which can be converted into an equivalent projection on a convex set problem.

9.2.10.1. Equality Constrained Quadratic ProgrammingΒΆ

Quadratic programming problems are discussed extensively in Quadratic Programming. Here we discuss a specific form of minimizing a quadratic function subject to linear equality constraints.

Example 9.1 (Equality constrained quadratic programming)

We consider the quadratic programming problem

minimize 12β€–xβ€–2+cTxsubject to Ax=0.

where

  1. x∈Rn is the optimization variable.

  2. c∈Rn is a given vector.

  3. A∈RmΓ—n is an mΓ—n matrix of rank m. Assume that m<n.

We proceed towards converting this problem into a projection on a convex set problem as follows.

  1. By adding a constant term 12β€–cβ€–2 to the objective function, we obtain an equivalent problem.

    minimize 12β€–c+xβ€–2subject to Ax=0.
  2. The set C={x|Ax=0} is the null space of the matrix A which is linear subspace, hence a nonempty, closed and convex set.

  3. Minimizing 12β€–c+xβ€–2 is equivalent to minimizing β€–(βˆ’c)βˆ’xβ€– subject to x∈C.

  4. β€–(βˆ’c)βˆ’xβ€– is d(βˆ’c,x), the distance between βˆ’c and a point x.

  5. Thus, we are minimizing the distance of the point βˆ’c among x∈C.

  6. This is nothing but the distance of βˆ’c from the set C.

  7. Since, C is nonempty, close and convex, hence, there is a unique xβˆ— which minimizes the distance due to the projection theorem.

  8. Thus, the solution is the projection of the vector βˆ’c on the subspace C.

  9. By Theorem 9.8, xβˆ— is the unique projection of βˆ’c on C if and only if βˆ’cβˆ’xβˆ—βˆˆCβŠ₯.

  10. In other words, $⟨c+xβˆ—,x⟩=0βˆ€x∈C.$

  11. A closed form solution to this problem does exist given by

    xβˆ—=βˆ’(Iβˆ’AT(AAT)βˆ’1bA)c.
  12. It is indeed the unique solution to this quadratic programming problem.