9.5. Constrained Optimization IΒΆ

In this section, we focus on objective functions of type f:Rnβ†’R which are convex and differentiable. Our goal is to minimize f over a convex set CβŠ†domf.

We also concern ourselves with functions of type f:Vβ†’(βˆ’βˆž,∞] where V is a Euclidean space and f is FrΓ©chet differentiable over some open set CβŠ†domf. For those results which concern with FrΓ©chet differentiable functions, we request the readers to revise the concepts of Gateaux and FrΓ©chet differentiability in Differentiation in Banach Spaces.

Main references for this section are [5, 10].

9.5.1. Stationary PointsΒΆ

We first look at functions which are differentiable over a convex set. Here, we don’t require that the function itself be convex. Thus, we may not characterize the global optimality of a point. However, we can still characterize the local optimality of a point.

In Definition 7.6, we defined stationary points for a real valued function as points where the gradient vanishes; i.e. βˆ‡f(x)=0.

In this section, we wish to restrict the domain of feasible points to a convex set C and consider the problem

minimize f(x)subject to x∈C

where f is differentiable over the convex set C.

We now introduce the notion of stationary points for the given optimization problem of minimizing f over C.

Definition 9.21 (Stationary point for an optimization problem)

Let f:Rn→R be a real valued function which is differentiable over a convex set C.

Then, a∈C is called a stationary point of the problem of minimizing f over C if

βˆ‡f(a)T(xβˆ’a)β‰₯0βˆ€x∈C.

If C=Rn, then the condition will reduce to βˆ‡f(a)=0.

Theorem 9.45 (Local minimum points are stationary points)

Let f:Rn→R be a real valued function which is differentiable over a convex set C. Consider the optimization problem

minimize f(x)subject to x∈C.

If a∈C is a local minimum, then a must be a stationary point.

Proof. Let a be a local minimum and for contradiction assume that it is not a stationary point.

  1. Then, there exists x∈C such that βˆ‡f(a)T(xβˆ’a)<0.

  2. Let t∈[0,1] be a parameter.

  3. Let zt=tx+(1βˆ’t)a.

  4. Since C is convex, hence zt∈C.

  5. Differentiating f(zt) w.r.t. t at t=0, we obtain

    ddtf(zt)|t=0=βˆ‡f(a)T(xβˆ’a)<0.
  6. Thus, for small enough t, f(zt)<f(a).

  7. This contradicts the hypothesis that a is a local minimum.

  8. Thus, all local minimum points must be stationary points.

Example 9.6 (Unconstrained minimization)

Let f:Rn→R be a real valued function which is differentiable. Consider the unconstrained optimization problem

minimize f(x).

In this case, the feasible set is C=Rn.

  1. If a is a stationary point for this problem, then

    βˆ‡f(a)T(xβˆ’a)β‰₯0βˆ€x∈Rn.
  2. In particular, if we choose x=aβˆ’βˆ‡f(a), we get

    βˆ‡f(a)T(xβˆ’a)=βˆ‡f(a)T(βˆ’βˆ‡f(a))=βˆ’β€–βˆ‡f(a)β€–2β‰₯0.
  3. This is true only if βˆ‡f(a)=0.

  4. Thus, for unconstrained minimization, the gradient vanishes at stationary points.

Theorem 9.46 (Stationary point as an orthogonal projection)

Let f:Rn→R be a real valued function which is differentiable over a convex set C. Consider the optimization problem

minimize f(x)subject to x∈C.

Let s>0. Then a∈C is a stationary point of the optimization problem if and only if

a=PC(aβˆ’sβˆ‡f(a)).

Proof. Recall from Theorem 9.7 that z∈C is the projection of x if and only if

⟨yβˆ’z,xβˆ’zβŸ©β‰€0βˆ€y∈C.
  1. Replace z=a and x=aβˆ’sβˆ‡f(a). We get

    ⟨yβˆ’a,aβˆ’sβˆ‡f(a)βˆ’aβŸ©β‰€0βˆ€y∈C⟺s⟨yβˆ’a,βˆ‡f(a)⟩β‰₯0βˆ€y∈CβŸΊβˆ‡f(a)T(yβˆ’a)β‰₯0βˆ€y∈C.
  2. But this is the same condition as the definition for a stationary point.

9.5.2. First Order Optimality CriteriaΒΆ

We now pay our attention to the case where f is convex as well as differentiable. In this case, a point is a global optimal point if and only if it is a stationary point.

Theorem 9.47 (Optimality criterion for differentiable objective function)

Let f:Rnβ†’R be a differentiable convex function. Let CβŠ†domf be a convex set. Consider the minimization problem

minimize f(x)subject to x∈C.

Then, x∈C is an optimal point if and only if

(9.8)ΒΆβˆ‡f(x)T(yβˆ’x)β‰₯0βˆ€y∈C.

In other words, x is optimal if and only if it is a stationary point.

Proof. By Theorem 8.98

f(y)β‰₯f(x)+βˆ‡f(x)T(yβˆ’x)

for every x,y∈domf.

Assume that some x∈C satisfies the optimality criterion in (9.8).

  1. Let y∈C.

  2. Then, by differentiability

    f(y)β‰₯f(x)+βˆ‡f(x)T(yβˆ’x).
  3. By hypothesis βˆ‡f(x)T(yβˆ’x)β‰₯0.

  4. Thus, f(y)β‰₯f(x).

  5. Since this is true for every y∈C, hence x is an optimal point for the minimization problem.

Now for the converse, assume that x∈C is an optimal point.

  1. For contradiction, assume that (9.8) doesn’t hold.

  2. Then, there exists y∈C such that

    βˆ‡f(x)T(yβˆ’x)<0.
  3. Let t∈[0,1] be a parameter.

  4. Let z(t)=ty+(1βˆ’t)x.

  5. Since C is convex, hence z(t)∈C.

  6. Differentiating f(z(t)) w.r.t. t at t=0, we obtain

    ddtf(z(t))|t=0=βˆ‡f(x)T(yβˆ’x)<0.
  7. Thus, for small enough t, f(z(t))<f(x).

  8. Thus, x cannot be an optimal point for the minimization problem.

  9. This contradicts our hypothesis that x is an optimal point.

  10. Hence, (9.8) must hold for every y∈C.

Theorem 9.48 (Optimality criterion for unconstrained problem with differentiable objective function)

Let f:Rn→R be a differentiable convex function. Consider the unconstrained minimization problem

minimize f(x)

Then, x∈Rn is an optimal point if and only if βˆ‡f(x)=0.

Proof. In this case, the set of feasible points is C=domf.

  1. Since f is differentiable, hence C is an open set.

  2. By Theorem 9.47, x is an optimal point if and only if

    βˆ‡f(x)T(yβˆ’x)β‰₯0βˆ€y∈C.
  3. If βˆ‡f(x)=0, then this inequality is satisfied. Hence, x must be an optimal point.

  4. Now, assume that x is an optimal point.

  5. Since x∈C and C is open, hence there exists a closed ball B[x,r]βŠ†C.

  6. Let y=xβˆ’tβˆ‡f(x).

  7. For sufficiently small t>0, y∈B[x,r]βŠ†C.

  8. Then,

    βˆ‡f(x)T(yβˆ’x)=βˆ‡f(x)T(βˆ’tβˆ‡f(x))=βˆ’tβ€–βˆ‡f(x)β€–22β‰₯0

    must hold true for t>0.

  9. This means that β€–βˆ‡f(x)β€–2≀0 must be true.

  10. Thus βˆ‡f(x)=0 must be true.

9.5.2.1. Nondifferentiability at the BoundaryΒΆ

There are some specific results available for the unconstrained minimization of a convex function f is not differentiable everywhere in its domain. If domf is not open, then f may be differentiable at intdomf but is not differentiable at the boundary points (domf)βˆ–(intdomf). The key questions are

  1. Under what conditions, the minimizer of f is a point of differentiability.

  2. Under what conditions, the minimizer of f may be at a point of nondifferentiability.

Theorem 9.49 (Zero gradient implies minimizer)

Let f:Vβ†’(βˆ’βˆž,∞] with S=domf be a proper convex function which is differentiable over some open set UβŠ†S. If βˆ‡f(x)=0 at some x∈U, then x is one of the minimizers for f.

Proof. Recall from Theorem 8.220 that f is differentiable at x if and only if

βˆ‚f(x)={βˆ‡f(x)}.
  1. Assume that f is differentiable at x∈U and βˆ‡f(x)=0.

  2. Then βˆ‡f(x) is the one and only subgradient of f at x.

  3. Due to subgradient inequality

    f(y)β‰₯f(x)+⟨yβˆ’x,βˆ‡f(x)βŸ©βˆ€y∈V.
  4. Since βˆ‡f(x)=0, hence

    f(y)β‰₯f(x)βˆ€y∈V.
  5. Thus f attains a minimum at x and x is one of its minimizers.

Next we consider the special case of the minimization of a convex function f which is differentiable in the interior of its domain but the gradient never vanishes. In this case, if a minimizer for f exists, it must be at the boundary of its domain.

Theorem 9.50 (Minimizers at points of nondifferentiability)

Let f:Vβ†’(βˆ’βˆž,∞] with S=domf be a proper convex function which is differentiable over some open set UβŠ†S and not differentiable over Sβˆ–U. Assume that f attains its minimum value at some a∈S; i.e., a minimizer of f exists. If βˆ‡f(x)β‰ 0 at every x∈U, then the minimizer must be at some point of nondifferentiability. In other words, a∈Sβˆ–U.

Proof. We are given that there exists a is a minimizer of f and for every x∈U, βˆ‡f(x)β‰ 0.

  1. Pick any x∈U.

  2. We need to show that x cannot be a minimizer.

  3. For contradiction, assume that x is a minimizer.

  4. By subgradient inequality

    f(y)β‰₯f(x)+⟨yβˆ’x,βˆ‡f(x)βŸ©βˆ€y∈V.
  5. Since x is a minimizer, hence we must have

    ⟨yβˆ’x,βˆ‡f(x)⟩β‰₯0βˆ€y∈V.
  6. Let y=xβˆ’βˆ‡f(x).

  7. Then

    ⟨yβˆ’x,βˆ‡f(x)⟩=βŸ¨βˆ’βˆ‡f(x),βˆ‡f(x)⟩=βˆ’β€–βˆ‡f(x)β€–2<0.
  8. This contradicts our assumption that x is a minimizer.

  9. Hence if a is a minimizer of f then a∈Sβˆ–U.

We next show how the condition in (9.8) simplifies for specific optimization problem structures.

9.5.2.2. Equality ConstraintsΒΆ

Theorem 9.51 (Differentiable objective minimization with equality constraints)

Let f:Rn→R be a differentiable convex function with domf=Rn. Consider the minimization problem:

minimize f(x)subject to Ax=b

where A∈RpΓ—n and b∈Rp represent p linear equality constraints. Assume that the problem is feasible.

Then, x is an optimal point for the minimization problem if and only if there exists a vector v∈Rp such that

βˆ‡f(x)+ATv=0.

Proof. The feasible set is given by C={x|Ax=b}. We recall from Theorem 9.47, that x is an optimal point if and only if βˆ‡f(x)T(yβˆ’x)β‰₯0βˆ€y∈C.

  1. Thus, x is feasible if and only if βˆ‡f(x)T(yβˆ’x)β‰₯0 for every y satisfying Ay=b.

  2. Since both x and y are feasible, hence A(yβˆ’x)=0.

  3. Thus, z=yβˆ’x∈N(A).

  4. In fact, y∈C if and only if y=x+z for some z∈N(A).

  5. Thus, the optimality criteria reduces to βˆ‡f(x)Tzβ‰₯0 for every z∈N(A).

  6. Note that βˆ‡f(x)Tz is a linear function of z as βˆ‡f(x) is a fixed vector.

  7. If a linear function is nonnegative on a subspace, then it must be identically zero on the subspace.

  8. Thus, βˆ‡f(x)Tz=0 for every z∈N(A).

  9. In other words, x is optimal if and only if βˆ‡f(x)βŠ₯N(A).

  10. Recall that N(A)βŠ₯=C(AT); i.e., the null space of A is orthogonal complement of the column space (range) of AT.

  11. Thus, x is optimal if and only if βˆ‡f(x)∈C(AT).

  12. In other words, x is optimal if and only if there exists a vector v∈Rp such that

    βˆ‡f(x)+ATv=0.

This result is a Lagrange multiplier optimality condition to be discussed in more detail in later sections.

9.5.2.3. Nonnegative Orthant ConstraintsΒΆ

Example 9.7 (Differentiable objective minimization over nonnegative orthant)

Let f:Rn→R be a differentiable convex function with domf=Rn. Consider the minimization problem:

minimize f(x)subject to xβͺ°0.
  1. The feasible set is the nonnegative orthant R+n.

  2. x is optimal if and only if xβͺ°0 and βˆ‡f(x)T(yβˆ’x)β‰₯0 for every yβͺ°0.

  3. The term βˆ‡f(x)Ty is unbounded below on y∈R+n unless βˆ‡f(x)∈R+n.

  4. Thus, βˆ‡f(x) must be nonnegative.

  5. Then, the minimum value for βˆ‡f(x)Ty is 0.

  6. Consequently, the optimality condition reduces to βˆ’βˆ‡f(x)Txβ‰₯0 or βˆ‡f(x)Tx≀0.

  7. But xβͺ°0 and βˆ‡f(x)βͺ°0.

  8. Thus, we must have βˆ‡f(x)Tx=0.`

  9. We note that

    βˆ‡f(x)Tx=βˆ‘i=1n(βˆ‡f(x))ixi.
  10. Thus, it is a sum of products of nonnegative numbers.

  11. So each term in the sum must be 0.

  12. Thus, (βˆ‡f(x))ixi=0 must hold true for every i=1,…,n.

  13. Thus, the optimality condition can be rephrased as

    xβͺ°0 and βˆ‡f(x)βͺ°0 and (βˆ‡f(x))ixi=0βˆ€i=1,…,n.

The condition (βˆ‡f(x))ixi=0 for every i is known as complementarity. It means that for every i either xi or (βˆ‡f(x))i or both must be 0. In other words, both xi and (βˆ‡f(x))i cannot be nonzero at the same time.

Thus, the sparsity patterns of x and βˆ‡f(x) are complementary. In other words,

supp(x)∩supp(βˆ‡f(x))=βˆ…

where supp(x) denotes the index set of nonzero entries of x.

9.5.2.4. Unit Sum Set ConstraintΒΆ

Example 9.8 (Minimization over unit sum set)

Let f:Rn→R be a differentiable convex function with domf=Rn. Consider the minimization problem:

minimize f(x)subject to 1Tx=1.
  1. The feasible set is given by

    C={x∈Rn|1Tx=1}={x∈Rn|βˆ‘i=1nxi=1}.
  2. This set is also known as the unit sum set.

  3. A point a∈C is optimal if and only if

    βˆ‡f(a)T(xβˆ’a)β‰₯0βˆ€x∈C.

    Let us call this condition (I).

  4. This is equivalent to the condition:

    βˆ‚fβˆ‚x1(a)=βˆ‚fβˆ‚x2(a)=β‹―=βˆ‚fβˆ‚xn(a).

    Let us call this condition (II).

  5. We shall show that (I) and (II) are equivalent.

We first show that (II) implies (I).

  1. Assume that some a∈C satisfies (II) with

    Ξ±=βˆ‚fβˆ‚x1(a)=βˆ‚fβˆ‚x2(a)=β‹―=βˆ‚fβˆ‚xn(a).
  2. Then, for any x∈C,

    βˆ‡f(a)T(xβˆ’a)=βˆ‘i=1nβˆ‚fβˆ‚xi(a)(xiβˆ’ai)=Ξ±βˆ‘i=1n(xiβˆ’ai)=Ξ±(βˆ‘i=1nxiβˆ’βˆ‘i=1nai)=Ξ±(1βˆ’1)=0.
  3. Thus, we see that βˆ‡f(a)T(xβˆ’a)β‰₯0 indeed and (I) is satisfied.

Now, assume that (I) is satisfied for some a∈C.

  1. For contradiction, assume that a doesn’t satisfy (II).

  2. Then, their exist i,j∈[1,…,n] such that

    βˆ‚fβˆ‚xi(a)>βˆ‚fβˆ‚xj(a).
  3. Pick a vector x∈C with following definition

    xk={akkβˆ‰{i,j}akβˆ’1k=iak+1k=j.
  4. Note that

    βˆ‘i=1nxk=βˆ‘i=1nak=1.

    Thus, x∈C holds.

  5. Now,

    βˆ‡f(a)T(xβˆ’a)=βˆ‘k=1nβˆ‚fβˆ‚xk(a)(xkβˆ’ak)βˆ‡f(a)T(xβˆ’a)=βˆ‚fβˆ‚xi(a)(xiβˆ’ai)+βˆ‚fβˆ‚xj(a)(xjβˆ’aj)=βˆ’βˆ‚fβˆ‚xi(a)+βˆ‚fβˆ‚xj(a)<0.
  6. This violates the hypothesis that (I) holds true.

  7. Thus, (II) must be true.

  8. Thus, (I) implies (II).

9.5.2.5. Unit Ball ConstraintΒΆ

Example 9.9 (Minimization over unit ball)

Let f:Rn→R be a convex function which is differentiable over B[0,1]. Consider the minimization problem:

minimize f(x)subject to β€–x‖≀1.
  1. The feasible set is given by

    C=B[0,1]={x|β€–x‖≀1}.
  2. A point a∈B[0,1] is an optimal point if and only if

    βˆ‡f(a)T(xβˆ’a)β‰₯0βˆ€x∈B[0,1].
  3. This is equivalent to saying that

    infβ€–x‖≀1(βˆ‡f(a)Txβˆ’βˆ‡f(a)Ta)β‰₯0.
  4. Recall from Theorem 7.11 that for any v∈Rn, the optimal value of the problem

    inf{vTx|β€–x‖≀1}

    is βˆ’β€–vβ€–.

  5. Thus,

    infβ€–x‖≀1βˆ‡f(a)Tx=βˆ’β€–βˆ‡f(a)β€–.
  6. Thus, the inequality simplifies to

    βˆ’βˆ‡f(a)Taβ‰₯β€–βˆ‡f(a)β€–.
  7. At the same time, by Cauchy Schwartz inequality,

    βˆ’βˆ‡f(a)Taβ‰€β€–βˆ‡f(a)β€–β€–aβ€–β‰€β€–βˆ‡f(a)β€–

    since a∈B[0,1].

  8. Thus, the inequality must be an equality, giving us, a is an optimal point

    βˆ’βˆ‡f(a)Ta=β€–βˆ‡f(a)β€–.

We now have following possibilities for this condition.

  1. If βˆ‡f(a)=0, then the condition holds and a is indeed an optimal point.

  2. Otherwise, if βˆ‡f(a)β‰ 0, then β€–aβ€–=1 must be true.

    1. For contradiction, if we assume that β€–aβ€–<1.

    2. Then, by Cauchy Schwartz inequality

      βˆ’βˆ‡f(a)Taβ‰€β€–βˆ‡f(a)β€–β€–aβ€–<β€–βˆ‡f(a)β€–,

      a contradiction.

  3. Thus, if βˆ‡f(a)β‰ 0, then a is an optimal point if and only if β€–aβ€–=1 and

    βˆ’βˆ‡f(a)Ta=β€–βˆ‡f(a)β€–=β€–βˆ‡f(a)β€–β€–aβ€–.
  4. But this is possible only when there exists t≀0 such that

    βˆ‡f(a)=ta.
  5. Thus, if βˆ‡f(a)β‰ 0, then a is an optimal point if and only if β€–aβ€–=1 and there exists t≀0 such that βˆ‡f(a)=ta.

9.5.3. Descent Directions MethodΒΆ

We first consider the problem of unconstrained minimization of a continuously differentiable function f.

Typical iterative algorithms which aim to find the solution x for the minimization problem start with an initial guess x0 and perform a step of the form

xk+1=xk+tkdk

where xk is the current guess (starting from x0), dk is a direction in which we move to make the next guess and tk is a step size in that direction. xk+1 is the next guess obtained from current guess. We say that an algorithm has made progress if f(xk+1)<f(xk).

This brings us to the notion of a descent direction.

Definition 9.22 (Descent direction)

Let f:V→R be a continuously differentiable function over V. A nonzero vector d is called a descent direction of f at x if the directional derivative f′(x;d) is negative.

In other words,

fβ€²(x;d)=⟨d,βˆ‡f(x)⟩<0.

If the directional derivative is negative, then it is clear that for a small enough step in this direction, the value of f will decrease.

Lemma 9.4 (Descent property of descent direction)

Let f:Vβ†’R be a continuously differentiable function over V. Let x∈V. Assume that d is a descent direction for f. Then, there exists Ο΅>0 such that

f(x+td)<f(x)

for any t∈(0,ϡ].

Proof. This follows from the negativity of the directional derivative.

  1. Recall from Definition 8.70 that

    fβ€²(x;d)=limtβ†’0+f(x+td)βˆ’f(x)t.
  2. Since d is a descent direction, hence fβ€²(x;d)<0.

  3. Thus,

    limtβ†’0+f(x+td)βˆ’f(x)t<0.
  4. Thus, there exists Ο΅>0 such that

    f(x+td)βˆ’f(x)t<0

    for every t∈(0,ϡ].

9.5.3.1. Descent Directions MethodΒΆ

Algorithm 9.1 (The descent directions method)

Initialization

  1. Pick x0∈V arbitrarily as initial solution.

General iteration: for k=0,1,2,…, execute the following steps

  1. Pick a descent direction dk.

  2. Pick a step size tk satisfying f(x+tkdk)<f(xk).

  3. Update: xk+1←xk+tkdk.

  4. Check for convergence: If converged, then STOP.

Return xk+1 as the output.

This method is not really an actual algorithm. It can be considered as a template for an actual algorithm. Several aspects of the method need to be carefully chosen to come up with a viable algorithm.

  1. How to select the initial point x0?

  2. How to choose the descent direction?

  3. How to select the step size?

  4. How to decide when to stop the iterations (stopping criterion)?

The key result that we establish here is to show that if the step size tk is sufficient small in the descent direction dk, then there is sufficient decrease in the value of f going from xk to xk+1.

Theorem 9.52 (Sufficient decrease condition for descent direction method)

Let f be a continuously differentiable function over Rn. Let x∈Rn. Assume that a nonzero d is a descent direction for f at x. Let α∈(0,1). Then, there exists ϡ>0 such that the inequality

f(x)βˆ’f(x+td)β‰₯βˆ’Ξ±t⟨d,βˆ‡f(x)⟩

holds for every t∈[0,ϡ].

Proof. We proceed as follows.

  1. Since f is continuously differentiable, hence due to Theorem 5.1,

    f(x+td)=f(x)+tβˆ‡f(x)Td+o(tβ€–dβ€–).
  2. Rearranging the terms and introducing an α∈(0,1), we obtain

    f(x)βˆ’f(x+td)=βˆ’Ξ±tβˆ‡f(x)Tdβˆ’(1βˆ’Ξ±)βˆ‡f(x)Tdβˆ’o(tβ€–dβ€–).
  3. Since d is a descent direction of f, hence f(x;d)=βˆ‡f(x)Td<0.

  4. Thus,

    limtβ†’0+(1βˆ’Ξ±)βˆ‡f(x)Td+o(tβ€–dβ€–)t=(1βˆ’Ξ±)βˆ‡f(x)Td<0.
  5. Hence, there exists ϡ>0 such that for every t∈[0,ϡ],

    (1βˆ’Ξ±)βˆ‡f(x)Td+o(tβ€–dβ€–<0.
  6. Thus, from the previous equation:

    f(x)βˆ’f(x+td)β‰₯βˆ’Ξ±tβˆ‡f(x)Td

    for every t∈[0,ϡ].

9.5.3.2. Step Size SelectionΒΆ

Following are common methods for step size selection. Each method has has advantages as well as drawbacks.

  1. Constant step size

  2. Exact line search

  3. Backtracking

Constant step size uses tk=tΒ― for every iteration.

  • A large step size might cause algorithm to take non-decreasing steps.

  • The algorithm may never converge with a large step size.

  • A small constant step size may lead to very slow convergence.

Exact line search solves the minimization problem

 minimize f along the ray xk+tdk.

The optimization variable is the step size parameter t∈R+. The solution is the value tkβ‰₯0.

  • Minimizing f along the ray may not be straight-forward.

  • Any closed form or algorithmic solution for the exact line search may not be available.

Backtracking is a compromise between these two approaches.

  1. Input parameters s>0, α∈(0,1), β∈(0,1).

  2. Initialize tk=s.

  3. While

    f(xk)βˆ’f(xk+tkdk)<βˆ’Ξ±tkβˆ‡f(xk)Td
    1. Set tk←βtk.

    2. Continue.

  4. Return step size tk.

Essentially, we are reducing the step size by a factor Ξ² iteratively till f shows sufficient decrease as stipulated by Theorem 9.52.

  • There is no exact line search.

  • Does find a good enough step size satisfying the sufficient decrease condition.

  • Involves multiple evaluations of f.

  • s should be chosen carefully.

9.5.4. Gradient MethodΒΆ

The gradient method is a descent direction method in which the descent direction is always chosen to be the negative of the gradient at the current point (solution).

dk=βˆ’βˆ‡f(xk).

Lemma 9.5 (Negative of gradient is a descent direction)

Let f be a continuously differentiable function over an open set S. At any point x∈S, the negative of the gradient d=βˆ’βˆ‡f(x) is a descent direction whenever βˆ‡f(x)β‰ 0.

Proof. We just need to compute the directional derivative.

fβ€²(x;d)=⟨d,βˆ‡f(x)⟩=βŸ¨βˆ’βˆ‡f(x),βˆ‡f(x)⟩=βˆ’β€–βˆ‡f(x)β€–2<0.

The last strict inequality is valid since βˆ‡f(x)β‰ 0.

9.5.5. Gradient Projection MethodΒΆ

In this subsection, we present a method to solve the optimization problem

minimize f(x)subject to x∈C

where C is a convex set and f is differentiable over C.

Recall from Theorem 9.46, that a is a stationary point for the problem of minimizing f over a convex set C if and only if

a=PC(aβˆ’sβˆ‡f(a)).

This stationarity condition is the basis for the gradient projection method presented below.

Algorithm 9.2 (The gradient projection method)

Inputs

  1. Ο΅>0 - tolerance parameter

Initialization

  1. Pick x0∈C arbitrarily.

General iteration: for k=0,1,2,…, execute the following steps

  1. Pick a step size tk by a line search procedure.

  2. Update: xk+1←PC(xkβˆ’tkβˆ‡f(xk)).

  3. Check for convergence: If β€–xk+1βˆ’xk‖≀ϡ, then STOP.

Return xk+1 as the output.

In the case of unconstrained minimization:

  1. C=Rn

  2. PC(xkβˆ’tkβˆ‡f(xk))=xkβˆ’tkβˆ‡f(xk).

  3. xk+1=xkβˆ’tkβˆ‡f(xk).

  4. We see that gradient projection reduces to gradient descent.

Another way to look at the algorithm is:

  1. yk+1=xkβˆ’tkβˆ‡f(xk) computes the next candidate solution assuming no constraints.

  2. xk+1=PC(yk+1) step projects the next candidate solution back to the feasible set C.

  3. Thus, we have a gradient step followed by a projection step.

9.5.5.1. ConvergenceΒΆ

If we can establish conditions under which each iteration of the gradient projection algorithm leads to sufficient decrease in the value of the objective function, then we can guarantee that the algorithm will converge in a finite number of steps.

Lemma 9.6 (Sufficient decrease lemma for constrained problems)

Consider the optimization problem:

minimize f(x)subject to x∈C.

Assume that C is a nonempty closed convex set, f is continuously differentiable over C, and βˆ‡f is Lipschitz continuous with a constant L over C, Then, for any x∈C, and t∈(0,2L), the following inequality holds.

f(x)βˆ’f(y)β‰₯t(1βˆ’Lt2)β€–1t(xβˆ’y)β€–2.

where y=PC(xβˆ’tβˆ‡f(x)).

Proof.