8.16. SubgradientsΒΆ

Primary references for this section are [6, 7].

Throughout this section, we assume that V,W are finite dimensional real vector spaces. Wherever necessary, they are equipped with a norm β€–β‹…β€– or an real inner product βŸ¨β‹…,β‹…βŸ©. They are also equipped with a metric d(x,y)=β€–xβˆ’yβ€– as needed.

8.16.1. SubgradientsΒΆ

Definition 8.72 (Subgradient)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper function. Let x∈domf. A vector g∈Vβˆ— is called a subgradient of f at x if

(8.6)ΒΆf(y)β‰₯f(x)+⟨yβˆ’x,gβŸ©βˆ€y∈V.

This inequality is known as the subgradient inequality.

Here, we assume that f is an extended value extension whenever required; i.e., f(x)=∞ for all xβˆ‰domf.

As discussed in Theorem 4.108, the vector spaces V and Vβˆ— are isomorphic. Therefore, we follow the convention that both V and Vβˆ— have exactly the same elements. The primary difference between V and Vβˆ— comes from the computation of norm. If V is endowed with a norm β€–β‹…β€– then Vβˆ— is endowed with a dual norm β€–β‹…β€–βˆ—.

In the arguments below B[a,r] or Bβ€–β‹…β€–[a,r] denotes the closed ball of radius r in the normed space (V,β€–β‹…β€–). The closed ball of radius r in the dual space (V,β€–β‹…β€–βˆ—) shall be denoted by Bβˆ—[a,r] or Bβ€–β‹…β€–βˆ—[a,r]. Open balls shall be denoted similarly.

Observation 8.9 (Global affine underestimator)

If g is a subgradient of f at some x∈domf, then the affine function a:Vβ†’R given by:

a(y)=f(x)+⟨yβˆ’x,g⟩=⟨y,g⟩+f(x)βˆ’βŸ¨x,gβŸ©βˆ€y∈V

is a global affine underestimator for f. Note that the term f(x)βˆ’βŸ¨x,g⟩ is a constant since x∈V and g∈Vβˆ— are fixed. This comes from the subgradient inequality:

f(y)β‰₯a(y)βˆ€y∈V.

Observation 8.10 (Subgradient inequality alternative form)

For yβˆ‰domf, f(y)=∞. Thus, the subgradient inequality is trivially satisfied for all yβˆ‰domf. An alternate form of the inequality is:

(8.7)ΒΆf(y)β‰₯f(x)+⟨yβˆ’x,gβŸ©βˆ€y∈domf.

8.16.1.1. Geometric InterpretationΒΆ

Observation 8.11 (Subgradient and supporting hyperplane)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper function. Then g be a subgradient of f at x if and only if epif has a supporting hyperplane at (x,f(x)) with a normal (βˆ’g,1).

Let H be a supporting hyperplane of epif at (x,f(x)) with the normal (βˆ’g,1).

  1. Then

    H={(y,t)|⟨y,βˆ’g⟩+t=⟨x,βˆ’g⟩+f(x)}.
  2. For any (y,f(y))∈epif, we must have

    ⟨y,βˆ’g⟩+f(y)β‰₯⟨x,βˆ’g⟩+f(x)⟺f(y)β‰₯f(x)+⟨yβˆ’x,g⟩.
  3. Then g is a subgradient of f at x.

Now let g be a subgradient of f at x.

  1. Let (y,t)∈epif.

  2. Then we have

    tβ‰₯f(y)β‰₯f(x)+⟨yβˆ’x,g⟩.
  3. Rearranging the terms, we have

    ⟨y,βˆ’g⟩+tβ‰₯⟨x,βˆ’g⟩+f(x)

    for every (y,t)∈epif.

  4. Then the hyperplane

    H={(y,t)|⟨y,βˆ’g⟩+t=⟨x,βˆ’g⟩+f(x)}

    is indeed a supporting hyperplane for epif.

  5. The normal vector for this hyperplane is (βˆ’g,1) and it passes through the point (x,f(x)).

8.16.2. SubdifferentialΒΆ

At a point x∈domf, it is possible that there are more than one subgradients. It is thus natural to introduce the notion of the set of all subgradients of f at a specific point x∈domf.

Definition 8.73 (Subdifferential set)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper function. The set of all subgradients of f at a point x∈domf is called the subdifferential of f at x and is denoted by βˆ‚f(x).

βˆ‚f(x)β‰œ{g∈Vβˆ—|f(y)β‰₯f(x)+⟨yβˆ’x,gβŸ©βˆ€y∈V}.

For all xβˆ‰domf, we define βˆ‚f(x)=βˆ….

Theorem 8.210 (Subdifferential of norm at origin)

Let f:V→R by given by:

f(x)=β€–xβ€–

where β€–β‹…β€– is the norm endowed on V. Then, the subdifferential of f at x=0 is given by the dual norm unit ball:

βˆ‚f(0)=Bβ€–β‹…β€–βˆ—[0,1]={g∈Vβˆ—|β€–gβ€–βˆ—β‰€1}.

Proof. gβˆˆβˆ‚f(0) if and only if

f(y)β‰₯f(0)+⟨yβˆ’0,gβŸ©βˆ€y∈V.

This reduces to:

β€–yβ€–β‰₯⟨y,gβŸ©βˆ€y∈V.

Maximizing both sides of this inequality over the set {y|β€–y‖≀1}, we obtain:

β€–gβ€–βˆ—=supβ€–y‖≀1{⟨y,g⟩}≀supβ€–y‖≀1β€–yβ€–=1.

Thus, β€–gβ€–βˆ—β‰€1 is a necessary condition.

We now show that β€–gβ€–βˆ—β‰€1 is sufficient too. By Generalized Cauchy Schwartz inequality:

⟨y,gβŸ©β‰€β€–yβ€–β€–gβ€–βˆ—β‰€β€–yβ€–βˆ€y∈V.

Thus, if β€–gβ€–βˆ—β‰€1, then g is a subgradient.

Thus, the vectors that satisfy the subgradient inequality are exactly the same as those in Bβ€–β‹…β€–βˆ—[0,1].

The subdifferential of a function f may be empty at specific points x∈V.

Definition 8.74 (Subdifferentiability)

A proper function f:Vβ†’(βˆ’βˆž,∞] is called subdifferentiable at some x∈domf if βˆ‚f(x)β‰ βˆ….

Definition 8.75 (Domain of subdifferentiability)

The set of points at which a proper function f:Vβ†’(βˆ’βˆž,∞] is subdifferentiable, denoted by dom(βˆ‚f), is defined as:

dom(βˆ‚f)β‰œ{x∈V|βˆ‚f(x)β‰ βˆ…}.

8.16.2.1. Closedness and ConvexityΒΆ

Theorem 8.211 (Closedness and convexity of the subdifferential set)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper function. Then the set βˆ‚f(x) is closed and convex for any x∈V.

Proof. Let x∈V be fixed. For any y∈V, define the set

Hy={g∈Vβˆ—|⟨yβˆ’x,gβŸ©β‰€f(y)βˆ’f(x)}.

Note that Hy is a closed half space in Vβˆ—.

It is easy to see that

βˆ‚f(x)=β‹‚y∈VHy.
gβˆˆβˆ‚f(x)⟺f(y)β‰₯f(x)+⟨yβˆ’x,gβŸ©βˆ€y∈V⟺f(y)βˆ’f(x)β‰₯⟨yβˆ’x,gβŸ©βˆ€y∈V⟺⟨yβˆ’x,gβŸ©β‰€f(y)βˆ’f(x)βˆ€y∈V⟺g∈Hyβˆ€y∈V⟺gβˆˆβ‹‚y∈VHy.

Thus, βˆ‚f(x) is an infinite intersection of closed and convex sets. Hence βˆ‚f(x) is closed and convex.

8.16.2.2. Subdifferentiability and Convex DomainΒΆ

Theorem 8.212 (Subdifferentiability + Convex domain ⟹ Convexity)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper function. Assume that domf is convex. If f is subdifferentiable at every x∈domf, then f is convex.

In other words:

βˆ€x∈domf,βˆ‚f(x)β‰ βˆ…βŸΉf is convex.

Proof. Let x,y∈domf. Let t∈[0,1]. Let z=(1βˆ’t)x+ty.

  1. Since domf is convex, hence z∈domf.

  2. By hypothesis, f is subdifferentiable at z.

  3. Thus, there exists gβˆˆβˆ‚f(z).

  4. By subgradient inequality (8.7)

    f(y)β‰₯f(z)+⟨yβˆ’z,g⟩=f(z)+(1βˆ’t)⟨yβˆ’x,g⟩f(x)β‰₯f(z)+⟨xβˆ’z,g⟩=f(z)βˆ’t⟨yβˆ’x,g⟩.
  5. Multiplying the first inequality by t, second by (1βˆ’t) and adding, we get:

    tf(y)+(1βˆ’t)f(x)β‰₯f(z).
  6. Thus,

    f((1βˆ’t)x+ty)=f(z)≀tf(y)+(1βˆ’t)f(x)

    holds true for any x,y∈domf and any t∈[0,1].

  7. Thus, f is convex.

A convex function need not be subdifferentiable at every point in its domain. The problem usually occurs at the boundary points of the domain if the domain is not open.

8.16.2.3. Positive ScalingΒΆ

Theorem 8.213 (Multiplication by a positive scalar)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper function. Let x∈domf. For any Ξ±>0,

βˆ‚(Ξ±f)(x)=Ξ±βˆ‚f(x).

Proof. Let gβˆˆβˆ‚f(x).

  1. By subgradient inequality (8.7)

    f(y)β‰₯f(x)+⟨yβˆ’x,gβŸ©βˆ€y∈domf.
  2. Multiplying by Ξ±, we get:

    (Ξ±f)(y)β‰₯(Ξ±f)(x)+⟨yβˆ’x,Ξ±gβŸ©βˆ€y∈dom(Ξ±f).
  3. Thus, Ξ±gβˆˆβˆ‚(Ξ±f)(x).

  4. Thus, Ξ±βˆ‚f(x)βŠ†βˆ‚(Ξ±f)(x).

  5. It is easy to see the same argument backwards to show that

    βˆ‚(Ξ±f)(x)=Ξ±βˆ‚f(x).

8.16.3. Proper Convex FunctionsΒΆ

In this subsection, we discuss the properties of the subdifferential sets for proper convex functions.

A proper convex function may not be subdifferentiable at every point in its domain. However, it is indeed subdifferentiable at the interior points and relative interior points of its domain.

8.16.3.1. Nonemptiness and Boundedness at Interior PointsΒΆ

Theorem 8.214 (Nonemptiness and boundedness of the subdifferential at interior points)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper convex function with S=domf. Let a∈intS. Then, βˆ‚f(a) is nonempty and bounded.

In other words, for a proper convex function, the subdifferential at the interior points of its domain is nonempty and bounded. We have

intdomfβŠ†dom(βˆ‚f).

Proof. Outline of the proof

  1. Identify a supporting hyperplane for the epigraph of f at (a,f(a).

  2. Make use of the local Lipschitz continuity of the convex function at its interior points.

  3. Show that the normal to the supporting hyperplane leads to a subgradient at a.

  4. Show that the subgradients are bounded by using the local Lipschitz continuity inequality and the subgradient inequality.

Consider the direct sum vector space VβŠ•R.

  1. epifβŠ†VβŠ•R.

  2. Since f is convex, hence epif is convex.

  3. For some a∈intS, consider the point (a,f(a))∈VβŠ•R.

  4. Since f is convex, hence (a,f(a))∈bdepif.

  5. By supporting hyperplane theorem, there exists a vector (p,βˆ’Ξ±)∈Vβˆ—βŠ•R such that

    ⟨x,pβŸ©βˆ’tΞ±β‰€βŸ¨a,pβŸ©βˆ’f(a)Ξ±βˆ€(x,t)∈epif.
  6. We shall show that Ξ±>0 must hold true and g=pΞ± is indeed a subgradient at a.

  7. We note that, (a,f(a)+1)∈epif. Putting it in,

    ⟨a,pβŸ©βˆ’(f(a)+1)Ξ±β‰€βŸ¨a,pβŸ©βˆ’Ξ±f(a)βŸΊβˆ’Ξ±β‰€0⟺αβ‰₯0.

    Thus, Ξ±β‰₯0.

  8. Recall from Theorem 8.174 that f is locally Lipschitz continuous at a∈intdomf.

  9. Thus, there exists r>0 and L>0 such that B[a,r]βŠ†S and

    |f(x)βˆ’f(a)|≀Lβ€–xβˆ’aβ€–βˆ€x∈B[a,r].
  10. Since B[a,r]βŠ†S, hence (x,f(x))∈epif for every x∈B[a,r].

  11. Plugging t=f(x) in the supporting hyperplane inequality, we get

    ⟨x,pβŸ©βˆ’f(x)Ξ±β‰€βŸ¨a,pβŸ©βˆ’f(a)Ξ±βˆ€x∈B[a,r].
  12. Rearranging the terms,

    ⟨xβˆ’a,pβŸ©β‰€Ξ±(f(x)βˆ’f(a))βˆ€x∈B[a,r].
  13. Using the local Lipschitz property,

    ⟨xβˆ’a,pβŸ©β‰€Ξ±Lβ€–xβˆ’aβ€–βˆ€x∈B[a,r].
  14. Recall that the dual norm for p∈Vβˆ— is given by

    β€–pβ€–βˆ—=sup{|⟨x,p⟩||x∈V,x‖≀1}.
  15. Let pβ€ βˆˆV with β€–p†‖=1 be the vector at which the supremum is attained.

  16. Then, β€–pβ€–βˆ—=⟨p†,p⟩ (since V is real).

  17. Since p† is a unit vector, hence a+rpβ€ βˆˆB[a,r].

  18. Plugging x=a+rp† in the inequality above, we get

    r⟨p†,pβŸ©β‰€Ξ±Lβ€–rpβ€ β€–βˆ€x∈B[a,r].
  19. Simplifying

    rβ€–pβ€–βˆ—β‰€Ξ±Lrβˆ€x∈B[a,r].
  20. This means that Ξ±>0 must be true.

    1. If Ξ±=0, then this inequality would require p=0.

    2. But (p,βˆ’Ξ±) is a nonzero vector describing the supporting hyperplane.

  21. Going back to the supporting hyperplane inequality and putting t=f(x), we have

    ⟨x,pβŸ©βˆ’f(x)Ξ±β‰€βŸ¨a,pβŸ©βˆ’f(a)Ξ±βˆ€x∈S.
  22. Rearranging the terms, we get

    Ξ±(f(x)βˆ’f(a))β‰₯⟨xβˆ’a,pβŸ©βˆ€x∈S.
  23. Letting g=1Ξ±p and dividing on both sides by Ξ± (which is positive), we obtain

    f(x)βˆ’f(a)β‰₯⟨xβˆ’a,gβŸ©βˆ€x∈S.
  24. Rearranging again

    f(x)β‰₯f(a)+⟨xβˆ’a,gβŸ©βˆ€x∈S

    which is the subgradient inequality.

  25. Thus, gβˆˆβˆ‚f(a).

  26. Thus, βˆ‚f(a) is nonempty.

We next show the boundedness of βˆ‚f(a).

  1. Let gβˆˆβˆ‚f(a).

  2. Let gβ€ βˆˆV such that β€–g†‖=1 and

    β€–gβ€–βˆ—=⟨g†,g⟩.
  3. Let x=a+rg†.

  4. Applying the subgradient inequality on x, we get:

    f(x)β‰₯f(a)+⟨rg†,g⟩=f(a)+rβ€–gβ€–βˆ—.
  5. Thus,

    rβ€–gβ€–βˆ—β‰€f(x)βˆ’f(a)≀Lβ€–xβˆ’aβ€–=Lβ€–rg†‖=Lr.
  6. Thus, β€–gβ€–βˆ—β‰€L for every gβˆˆβˆ‚f(a).

  7. Thus, βˆ‚f(a) is bounded.

If f is a proper convex function, then the only points at which f may not be subdifferentiable (i.e. the subdifferential set is empty) are the points at the frontier of domf (i.e., domfβˆ–intdomf). f may be subdifferentiable on the frontier points too.

Corollary 8.19 (Subdifferentiability of real valued convex functions)

Let f:V→R be a convex function. Then, f is subdifferentiable over V.

dom(βˆ‚f)=V.

Proof. We have domf=V.

  1. Let x∈V.

  2. V is open in (V,β€–β‹…β€–).

  3. Thus, x∈intV=domf.

  4. By Theorem 8.214, βˆ‚f(x) is nonempty and bounded as x∈intdomf.

  5. Hence, f is subdifferentiable at x.

  6. Since this is valid for every x∈V, hence dom(βˆ‚f)=V.

8.16.3.2. Nonempty, Convex and Compact SubdifferentialsΒΆ

Theorem 8.215 (Nonempty, convex and compact subdifferentials for proper convex functions)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper and convex function. Let x∈intS. Then βˆ‚f(x) is nonempty, convex and compact.

Proof. Let x∈intS.

  1. By Theorem 8.211, βˆ‚f(x) is closed and convex.

  2. By Theorem 8.214, βˆ‚f(x) is nonempty and bounded.

  3. Since βˆ‚f(x) is closed and bounded, hence it must be compact since V is a finite dimensional normed linear space.

We present an alternative proof based on the min common/max crossing framework developed in Min Common/Max Crossing Duality. This proof can be skipped in first reading.

Proof. This proof is based on the second min common/max crossing theorem Theorem 9.34.

We fix some x∈intS. We construct the set M as

M={(u,t)|u∈V,f(x+u)≀t}.

We first consider the min common problem.

  1. Note that (0,f(x))∈M since f(x+0)≀f(x).

  2. Further see that the min common value

    pβˆ—=inf(0,p)∈Mp=f(x).
  3. Hence pβˆ— is finite.

  4. Note that M―=M where

    M―={(x,t)∈VβŠ•R| there exists t¯∈R with t¯≀t and (x,tΒ―)∈M}.
  5. M is convex.

    1. Let (u1,t1),(u2,t2)∈M and let r∈(0,1).

    2. Let (u,t)=r(u1,t1)+(1βˆ’r)(u2,t2).

    3. We have f(x+u1)≀t1 and f(x+u2)≀t2.

    4. Now

      f(x+u)=f(x+ru1+(1βˆ’r)u2)=f(r(x+u1)+(1βˆ’r)(x+u2))≀rf(x+u1)+(1βˆ’r)f(x+u2)≀rt1+(1βˆ’r)t2=t.
    5. Hence (u,t)∈M.

    6. Hence M is convex.

Next consider the max crossing problem and strong duality.

  1. We have

    qβˆ—=supa∈Vq(a)

    where

    q(a)=inf(u,t)∈M{⟨u,a⟩+t}.
  2. The set of optimal solutions of the max crossing problem is given by

    Qβˆ—={a∈V|q(a)=qβˆ—}
  3. For some a∈Qβˆ—, we can attain strong duality with

    pβˆ—=qβˆ—=q(a)

    if and only if

    f(x)=inf(u,t)∈M{⟨u,a⟩+t}.
  4. Equivalently

    f(x)≀f(x+u)+⟨u,aβŸ©βˆ€u∈V.
  5. Equivalently

    f(x+u)β‰₯f(x)+⟨u,βˆ’aβŸ©βˆ€u∈V.
  6. But this is nothing but the subgradient inequality with βˆ’a as the subgradient at x.

  7. In other words, strong duality is attained at a as a solution of the max crossing problem if and only if βˆ’a is a subgradient of f.

  8. Hence Qβˆ— with strong duality is given by βˆ’βˆ‚f(x).

We next establish the conditions for the second min common/max crossing theorem Theorem 9.34

  1. Consider the set

    D={u∈V| there exists t∈R with (u,t)∈M}.
  2. It is easy to see that D=Sβˆ’x.

    1. Consider the set T=Sβˆ’x.

    2. Let u∈T.

    3. Then x+u∈S.

    4. Hence f(x+u)≀f(x+u).

    5. Hence (u,f(x+u))∈M.

    6. Hence u∈D.

    7. Hence T=Sβˆ’xβŠ†D.

    8. Let uβˆ‰T.

    9. Then u+xβˆ‰S.

    10. Hence f(u+x)=∞.

    11. Hence for every t∈R, f(u+x)>t.

    12. Hence (u,t)βˆ‰M for every t∈R.

    13. Hence uβˆ‰D.

    14. Hence D=T=Sβˆ’x.

  3. Since x∈intS, hence 0∈intD.

  4. We see that all the conditions of the second min common/max crossing theorem are satisfied.

    1. pβˆ— is finite.

    2. M―=M is convex.

    3. The set D contains 0 in its interior.

  5. Hence βˆ’βˆ‚f(x) is nonempty, convex and compact.

  6. Hence βˆ‚f(x) is also nonempty, convex and compact.

8.16.3.3. Subgradients over a Compact SetΒΆ

Theorem 8.216 (Subgradients over a compact set are nonempty and bounded)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper convex function with S=domf. Let AβŠ†intS be a nonempty and compact subset of the interior of the domain of f. Then, the set of subgradients over A given by

Y=⋃x∈Aβˆ‚f(x)

is nonempty and bounded.

Proof. We are given that A is nonempty and compact subset of interior of domain of f.

  1. For any x∈A, by Theorem 8.214, βˆ‚f(x) is nonempty and bounded.

  2. Thus, Y is nonempty.

We next prove that Y must be bounded also.

  1. Let T=Vβˆ–(intS).

  2. T is closed and A is closed. AβŠ†intS. Hence, A∩T=βˆ….

  3. Since A is compact and T is closed and A∩T=βˆ…, hence distance between A and T is nonzero due to Theorem 3.128.

    r=d(A,T)>0.
  4. Thus,

    β€–xβˆ’yβ€–β‰₯rβˆ€x∈A,yβˆ‰intS.
  5. Let s=r2.

  6. Let D=B[0,s]. D is a closed and bounded set. Hence, it is compact due to Theorem 4.68.

  7. Let E=A+D. Then EβŠ†intS.

    1. Let y∈E.

    2. Then, there is x∈A and v∈D such that y=x+v.

    3. Thus, yβˆ’x=v.

    4. Hence β€–yβˆ’x‖≀s<r.

  8. Since both A and D are compact, hence E is compact due to Theorem 4.69.

  9. By Theorem 8.174, f is local Lipschitz continuous at every x∈E since x∈EβŠ†intS.

  10. Then, by Theorem 3.79, f is Lipschitz continuous on E.

  11. Thus, there exists L>0 such that

    |f(x)βˆ’f(y)|≀Lβ€–xβˆ’yβ€–βˆ€x,y∈E.
  12. Let g∈Y. Then, there is x∈A such that gβˆˆβˆ‚f(x).

  13. we can choose gβŠ₯∈V such that

    β€–gβ€–βˆ—=⟨gβŠ₯,g⟩ and β€–gβŠ₯β€–=1.
  14. Now, let y=x+sgβŠ₯. Then, y∈E.

    1. β€–yβˆ’xβ€–=β€–sgβŠ₯β€–=s.

    2. Thus, sgβŠ₯∈D.

    3. Thus, y∈E since x∈A.

  15. Also, x∈E since x=x+0 and 0∈D.

  16. Consequently, by Lipschitz continuity

    |f(y)βˆ’f(x)|≀Lβ€–yβˆ’xβ€–=Ls.
  17. By subgradient inequality at x

    f(y)βˆ’f(x)β‰₯⟨yβˆ’x,g⟩=s⟨gβŠ₯,g⟩=sβ€–gβ€–βˆ—.
  18. Using the Lipschitz bound above, we get

    sβ€–gβ€–βˆ—β‰€Ls.
  19. Thus, β€–gβ€–βˆ—β‰€L.

  20. Since g was chosen arbitrarily, hence Y is bounded.

We recall from Definition 8.56 that the relative interior of a convex set is given by

riC={x∈C|βˆƒr>0,B(x,r)∩affCβŠ†C}.

It is interior of the convex set w.r.t. the subspace topology of its affine hull.

8.16.3.4. Nonempty Subdifferential at Relative Interior PointsΒΆ

Theorem 8.217 (Nonemptiness of the subdifferential at relative interior points)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper convex function with S=domf. Let x∈riS. Then, βˆ‚f(x) is nonempty and has the form

βˆ‚f(x)=LβŠ₯+G

where L is the subspace that is parallel to affS and G is a nonempty and compact set.

In other words, for a proper convex function, the subdifferential at the relative interior points of its domain is nonempty. We have

ridomfβŠ†dom(βˆ‚f).

The proof is based on the min common/max crossing framework developed in Min Common/Max Crossing Duality. It can be skipped in first reading. It follows the proof of Theorem 8.215.

Proof. This proof is based on the second min common/max crossing theorem Theorem 9.34.

We fix some x∈riS. We construct the set M as

M={(u,t)|u∈V,f(x+u)≀t}.

We have already established the following in the proof of Theorem 8.215:

  1. M is convex.

  2. M―=M.

  3. pβˆ—=f(x).

  4. pβˆ— is finite.

  5. qβˆ—=supa∈Vq(a) where

    q(a)=inf(u,t)∈M{⟨u,a⟩+t}.
  6. Qβˆ—={a∈V|q(a)=qβˆ—}.

  7. When the strong duality holds, then

    Qβˆ—=βˆ’βˆ‚f(x).
  8. The set

    D={u∈V| there exists t∈R with (u,t)∈M}=Sβˆ’x.

Continuing further

  1. Since x∈riS, hence 0∈riD.

  2. Hence affD=affSβˆ’x=L.

  3. Hence by the second min common/max crossing theorem

    βˆ’βˆ‚f(x)=Qβˆ—=LβŠ₯+Q~

    where Q~ is a nonempty, convex and compact set.

  4. Negating on both sides, we obtain

    βˆ‚f(x)=LβŠ₯+G

    where G is a nonempty, convex and compact set.

Corollary 8.20 (Existence of points with nonempty subdifferential)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper convex function with S=domf. Then, there exists x∈S such that βˆ‚f(x) is nonempty.

Proof. The effective domain of a proper convex function is convex and nonempty.

  1. By Theorem 8.142, the relative interior of S=domf is nonempty.

  2. Thus, there exists x∈S.

  3. By Theorem 8.217, βˆ‚f(x) is nonempty.

8.16.3.5. Unbounded SubdifferentialΒΆ

Theorem 8.218 (Unboundedness condition for subdifferential)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper convex function with S=domf.

Assume that dimS<dimV. Let x∈S. If βˆ‚f(x)β‰ βˆ…, then βˆ‚f(x) is unbounded.

Proof. We proceed as follows.

  1. Let n=dimV.

  2. Let A=affS. A is an affine set.

  3. We have x∈SβŠ†A.

  4. Then, W=Aβˆ’x is the subspace parallel to A.

  5. Accordingly, m=dimW<dimV=n.

  6. Then, the orthogonal complement of W is a nontrivial subspace with dimension nβˆ’m.

  7. Let v∈WβŠ₯ be a nonzero vector.

  8. Then, ⟨w,v⟩=0 for every w∈W.

  9. Now let gβˆˆβˆ‚f(x) be an arbitrary subgradient at x.

  10. By subgradient inequality

    f(y)β‰₯f(x)+⟨yβˆ’x,gβŸ©βˆ€y∈S.
  11. Note that both x∈S and y∈S.

  12. Hence, yβˆ’x∈W.

  13. Thus, ⟨yβˆ’x,v⟩=0.

  14. But then, for any α∈R,

    ⟨yβˆ’x,(g+Ξ±v)⟩=⟨yβˆ’x,g⟩+α⟨yβˆ’x,v⟩=⟨yβˆ’x,g⟩.
  15. Thus, if gβˆˆβˆ‚f(x), then g+Ξ±vβˆˆβˆ‚f(x) for every α∈R.

  16. Thus, βˆ‚f(x) is unbounded.

8.16.4. Directional DerivativesΒΆ

The directional derivative of a proper convex function is closely linked with its subdifferential. To see this, let x∈intdomf, let d∈V be a nonzero direction and t>0. Let gβˆˆβˆ‚f(x) and consider the subgradient inequality

f(x+td)β‰₯f(x)+⟨td,g⟩.

Hence

f(x+td)βˆ’f(x)tβ‰₯⟨d,g⟩

We saw in Observation 8.8 that f(x+td)βˆ’f(x)t is a nondecreasing quantity and

fβ€²(x;d)=inft>0f(x+td)βˆ’f(x)t.

This establishes the basic relation

(8.8)ΒΆfβ€²(x;d)β‰₯⟨d,g⟩

for every gβˆˆβˆ‚f(x). In fact a stronger result is available in the form of max formula.

8.16.4.1. Max FormulaΒΆ

The max formula is one of the key results in this section. It connects subgradients with directional derivatives.

Theorem 8.219 (Max formula)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper convex function with S=domf. Then for any x∈intS and d∈V,

fβ€²(x;d)=sup{⟨d,g⟩|gβˆˆβˆ‚f(x)}.

In words, the directional derivative is the supremum of the inner product of the subgradients with the direction.

Proof. Let x∈intS and d∈V.

  1. Let t>0. Then, by subgradient inequality

    f(x+td)βˆ’f(x)β‰₯⟨td,gβŸ©βˆ€gβˆˆβˆ‚f(x).
  2. Thus,

    f(x+td)βˆ’f(x)tβ‰₯⟨d,gβŸ©βˆ€gβˆˆβˆ‚f(x).
  3. Taking the limit

    fβ€²(x;d)=limtβ†’0+f(x+td)βˆ’f(x)tβ‰₯limtβ†’0+⟨d,g⟩=⟨d,g⟩

    for every gβˆˆβˆ‚f(x).

  4. Taking the supremum over βˆ‚f(x) on the R.H.S., we obtain

    fβ€²(x;d)β‰₯sup{⟨d,g⟩|gβˆˆβˆ‚f(x)}.

We now show that the inequality is indeed an equality.

  1. Let h:V→R be given by

    h(v)=fβ€²(x;v).
  2. By Theorem 8.206, h is a real valued convex function and nonnegative homogeneous.

  3. By Corollary 8.19, h is subdifferentiable everywhere in V.

  4. In particular, h is subdifferentiable at d.

  5. Let gβˆˆβˆ‚h(d).

  6. For any v∈V and tβ‰₯0,

    tfβ€²(x;v)=th(v)=h(tv)

    since h is nonnegative homogeneous.

  7. By subdifferential inequality

    tfβ€²(x;v)=h(tv)β‰₯h(d)+⟨tvβˆ’d,g⟩=fβ€²(x;d)+⟨tvβˆ’d,g⟩.
  8. Rearranging the terms,

    t(fβ€²(x;v)βˆ’βŸ¨v,g⟩)β‰₯fβ€²(x;d)βˆ’βŸ¨d,g⟩.
  9. Since this inequality is valid for every tβ‰₯0, hence the term fβ€²(x;v)βˆ’βŸ¨v,g⟩ must be nonnegative. Otherwise, the inequality will be invalided for large enough t. Thus,

    fβ€²(x;v)β‰₯⟨v,g⟩.
  10. By Theorem 8.207, for any y∈S,

    f(y)β‰₯f(x)+fβ€²(x;yβˆ’x).
  11. From previous inequality,

    fβ€²(x;yβˆ’x)β‰₯⟨yβˆ’x,g⟩.
  12. Thus, for any y∈S,

    f(y)β‰₯f(x)+⟨yβˆ’x,g⟩.
  13. But this is a subgradient inequality. Hence, gβˆˆβˆ‚f(x).

  14. Taking t=0, in the subgradient inequality for h,

    0β‰₯fβ€²(x;d)βˆ’βŸ¨d,g⟩.
  15. Thus, there exists gβˆˆβˆ‚f(x) such that

    fβ€²(x;d)β‰€βŸ¨d,g⟩.
  16. Consequently,

    fβ€²(x;d)≀sup{⟨d,g⟩|gβˆˆβˆ‚f(x)}.

Combining the two inequalities, we obtain the max formula:

fβ€²(x;d)=sup{⟨d,g⟩|gβˆˆβˆ‚f(x)}.

Recall from Definition 8.51 that support function for a set C is given by

ΟƒC(x)=sup{⟨x,y⟩|y∈C}.

Corollary 8.21 (Max formula as a support function)

The max formula can be written as

fβ€²(x;d)=Οƒβˆ‚f(x)(d).

8.16.5. DifferentiabilityΒΆ

8.16.5.1. Subdifferential and gradientΒΆ

Theorem 8.220 (Subdifferential at points of differentiability)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper convex function with S=domf. Let x∈intS.

Then f is differentiable at x if and only if

βˆ‚f(x)={βˆ‡f(x)}.

In other words, if f is differentiable at x then its subdifferential is a singleton set consisting of the gradient and if the subdifferential at x is a singleton, then f is differentiable at x.

Proof. Assume that f is differentiable at x.

  1. Let d∈V be some direction.

  2. By Theorem 8.203,

    fβ€²(x;d)=⟨d,βˆ‡f(x)⟩.
  3. By Theorem 8.214, f is subdifferentiable at x since f is convex and x∈intS.

  4. Let gβˆˆβˆ‚f(x).

  5. By the max formula (Theorem 8.219):

    fβ€²(x;d)β‰₯⟨d,g⟩

    as the directional derivative is the supremum of the inner product of the subgradients with the direction.

  6. Thus,

    ⟨d,βˆ‡f(x)⟩β‰₯⟨d,g⟩.
  7. In turn,

    ⟨d,gβˆ’βˆ‡f(x)βŸ©β‰€0.

    This holds for every d∈V.

  8. By the definition of dual norm

    β€–gβˆ’βˆ‡f(x)β€–βˆ—=supβ€–d‖≀1{⟨d,gβˆ’βˆ‡f(x)⟩}.
  9. Using the previous inequality

    β€–gβˆ’βˆ‡f(x)β€–βˆ—β‰€0.
  10. Since, dual norm is a norm, hence it cannot be negative. Thus,

    β€–gβˆ’βˆ‡f(x)β€–βˆ—=0
  11. Moreover, due to positive definiteness of a norm

    gβˆ’βˆ‡f(x)=0

    must hold true.

  12. Thus, g=βˆ‡f(x).

  13. In other words, if g is a subgradient to f at x, then it must equal βˆ‡f(x).

  14. Thus, the only subgradient for f at x is βˆ‡f(x).

  15. Thus,

    βˆ‚f(x)={βˆ‡f(x)}.

For the converse, assume that f is subdifferentiable at x with βˆ‚f(x)={g}.

  1. By the subgradient inequality

    f(x+u)β‰₯f(x)+⟨u,gβŸ©βˆ€u∈V.
  2. Thus,

    f(x+u)βˆ’f(x)βˆ’βŸ¨u,g⟩β‰₯0βˆ€u∈V.
  3. Define a function h:Vβ†’(βˆ’βˆž,∞] as

    h(u)β‰œf(x+u)βˆ’f(x)βˆ’βŸ¨u,g⟩.
  4. We list some properties of h.

    1. By definition h(u)β‰₯0 for every u∈V.

    2. h is a convex function since f(x+u) is convex, ⟨u,g⟩ is linear and f(x) is a constant (w.r.t. the variable u).

    3. domh=domfβˆ’x=Sβˆ’x.

    4. Thus, since x∈intS, hence 0=xβˆ’x∈intdomh.

    5. h(0)=f(x)βˆ’f(x)βˆ’βŸ¨0,g⟩=0.

  5. If we are able to show that

    limu→0h(u)‖u‖=0

    then, by the definition of gradient (Definition 8.71),

    g=βˆ‡f(x).
  6. We can easily show that βˆ‚h(0)={0}.

    1. If g~ is a subgradient of h at 0, then by subgradient inequality

      h(u)β‰₯h(0)+⟨u,g~⟩=⟨u,g~βŸ©βˆ€u∈V.
    2. Then, g~=0 satisfies this inequality since h(u)β‰₯0 by definition.

    3. For contradiction, assume a nonzero g~ can satisfy this inequality.

    4. Then,

      h(u)β‰₯⟨u,g~⟩⟺f(x+u)βˆ’f(x)βˆ’βŸ¨u,g⟩β‰₯⟨u,g~⟩⟺f(x+u)β‰₯f(x)+⟨u,g~+g⟩⟺g~+gβˆˆβˆ‚f(x).
    5. This contradicts the hypothesis that the subgradient of f at x is {g}.

    6. Thus, βˆ‚h(0)={0}.

  7. Then, max formula (Theorem 8.219):

    hβ€²(0;d)=Οƒβˆ‚h(0)(d)=⟨d,0⟩=0.
  8. Thus, from the definition of directional derivatives

    0=hβ€²(0;d)=limΞ±β†’0+h(Ξ±d)βˆ’h(0)Ξ±=limΞ±β†’0+h(Ξ±d)Ξ±.
  9. Let us now introduce an orthonormal basis for V as {e1,…,en}.

  10. Assume that V has been equipped with various β„“p norms as described in Remark 8.1.

  11. Since 0∈intdomh, there exists r∈(0,1) such that

    B1[0,r]βŠ†domh.
  12. It is a cross polytope of radius r with 2n vertices given by {Β±rei}i=1n.

    B1[0,r]=conv{Β±rei}i=1n.
  13. Let us denote these 2n vectors as w1,…,w2n.

  14. By Remark 8.1

    B[0,s]=B2[0,s]βŠ†B1[0,r]

    where s=rn.

  15. Let u∈B[0,s2] be a nonzero vector.

  16. Since r<1, hence s<1, hence s2<s.

  17. Let v=suβ€–uβ€–.

  18. Then, v∈B[0,s]βŠ†B1[0,r].

  19. Thus, v∈conv{wi}i=1n.

  20. Thus, there exists tβˆˆΞ”2n such that

    suβ€–uβ€–=v=βˆ‘i=12ntiwi.
  21. Then,

    h(u)β€–uβ€–=h(β€–uβ€–ssuβ€–uβ€–)β€–uβ€–=h(βˆ‘i=12ntiβ€–uβ€–swi)β€–uβ€–convex combinationβ‰€βˆ‘i=12ntih(β€–uβ€–wis)β€–uβ€–h is convex and tβˆˆΞ”2n≀maxi=1,…,2n{h(β€–uβ€–wis)β€–uβ€–}since βˆ‘ti=1.

    Note that β€–uβ€–wis∈B[0,s]βŠ†B1[0,r]βŠ†domh.

  22. Now,

    limu→0h(‖u‖wis)‖u‖=lim‖u‖→0h(‖u‖wis)‖u‖=limα→0+h(αwis)α=0.
  23. Thus,

    limu→0h(u)‖u‖=0.
  24. Thus, g=βˆ‡f(x) as desired.

8.16.6. Subdifferential CalculusΒΆ

8.16.6.1. Function SumsΒΆ

Theorem 8.221 (Subdifferential subadditivity with sum of functions)

Let f1,f2:Vβ†’(βˆ’βˆž,∞] be proper functions with S1=domf1 and S2=domf2. For any x∈S1∩S2

βˆ‚f1(x)+βˆ‚f2(x)βŠ†βˆ‚(f1+f2)(x).

Proof. Let f=f1+f2. We note that domf=dom(f1+f2)=domf1∩domf2=S1∩S2.

  1. Let x∈S1∩S2.

  2. Let gβˆˆβˆ‚f1(x)+βˆ‚f2(x).

  3. Then, there exist g1βˆˆβˆ‚f1(x) and g2βˆˆβˆ‚f2(x) such that g=g1+g2.

  4. Then, by subgradient inequality, for any y∈S1∩S2

    f1(y)β‰₯f1(x)+⟨yβˆ’x,g1⟩,f2(y)β‰₯f2(x)+⟨yβˆ’x,g2⟩.
  5. Summing the two inequalities, we get

    f1(y)+f2(y)β‰₯f1(x)+f2(x)+⟨yβˆ’x,g1+g2⟩.
  6. Rewriting, for every y∈domf

    (f1+f2)(y)β‰₯(f1+f2)(x)+⟨yβˆ’x,g⟩.
  7. Thus, g=g1+g2βˆˆβˆ‚(f1+f2)(x) = \partial f (\bx).

  8. Thus, βˆ‚f1(x)+βˆ‚f2(x)βŠ†βˆ‚f(x).

We can generalize this result for a finite sum of functions using simple mathematical induction.

Corollary 8.22 (Weak sum rule of subdifferential calculus)

Let f1,…,fm:Vβ†’(βˆ’βˆž,∞] be proper functions. For any x∈∩i=1mdomfi

βˆ‘i=1mβˆ‚fi(x)βŠ†βˆ‚(βˆ‘i=1mfi)(x).

Theorem 8.222 (Subdifferential additivity with sum of convex functions)

Let f1,f2:Vβ†’(βˆ’βˆž,∞] be proper convex functions with S1=domf1 and S2=domf2. For any x∈intS1∩intS2

βˆ‚(f1+f2)(x)=βˆ‚f1(x)+βˆ‚f2(x).

Proof. With f=f1+f2, by Theorem 3.22,

intdomf=int(S1∩S2)=intS1∩intS2.
  1. Let x∈intS1∩intS2.

  2. Thus, x∈intdomf.

  3. By max formula, for any d∈V,

    fβ€²(x;d)=sup{⟨d,g⟩|gβˆˆβˆ‚f(x)}=Οƒβˆ‚f(x)(d).
  4. Since directional derivative is additive, hence

    fβ€²(x;d)=f1β€²(x;d)+f2β€²(x;d).
  5. Expanding on this

    Οƒβˆ‚f(x)(d)=fβ€²(x;d)=f1β€²(x;d)+f2β€²(x;d)=sup{⟨d,g1⟩|g1βˆˆβˆ‚f1(x)}+sup{⟨d,g1⟩|g2βˆˆβˆ‚f2(x)}=sup{⟨d,g1+g2⟩|g1βˆˆβˆ‚f1(x),g2βˆˆβˆ‚f2(x)}=sup{⟨d,g⟩|gβˆˆβˆ‚f1(x)+βˆ‚f2(x)}=Οƒβˆ‚f1(x)+βˆ‚f2(x)(d).
  6. In summary, for every d∈V,

    Οƒβˆ‚f(x)(d)=Οƒβˆ‚f1(x)+βˆ‚f2(x)(d).
  7. By Theorem 8.211, βˆ‚f(x), βˆ‚f1(x) and βˆ‚f2(x) are closed and convex.

  8. By Theorem 8.214, βˆ‚f(x), βˆ‚f1(x) and βˆ‚f2(x) are nonempty and bounded.

  9. Since βˆ‚f1(x) and βˆ‚f2(x) are closed and bounded, hence they are compact (V is finite dimensional).

  10. Thus, βˆ‚f1(x)+βˆ‚f2(x) is also closed, bounded, convex and nonempty.

  11. Thus, both βˆ‚f(x) and βˆ‚f1(x)+βˆ‚f2(x) are nonempty, convex and closed.

  12. Then, due to Theorem 8.90,

    βˆ‚f(x)=βˆ‚f1(x)+βˆ‚f2(x).

We can generalize this result for a finite sum of proper convex functions using simple mathematical induction.

Corollary 8.23 (Sum rule of subdifferential calculus for proper convex functions at interior points)

Let f1,…,fm:Vβ†’(βˆ’βˆž,∞] be proper convex functions. For any x∈∩i=1mintdomfi

βˆ‘i=1mβˆ‚fi(x)=βˆ‚(βˆ‘i=1mfi)(x).

For real valued convex functions, the domain is the entire V and interior of V is V itself.

Corollary 8.24 (Sum rule of subdifferential calculus for real valued convex functions)

Let f1,…,fm:Vβ†’R be real valued convex functions. For any x∈V

βˆ‘i=1mβˆ‚fi(x)=βˆ‚(βˆ‘i=1mfi)(x).

A more powerful result with less restrictive assumptions than Corollary 8.23 is possible if the intersection of the relative interiors of the domains of the individual functions is nonempty.

Theorem 8.223 (Sum rule of subdifferential calculus for proper convex functions)

Let f1,…,fm:Vβ†’(βˆ’βˆž,∞] be proper convex functions. Assume that β‹‚i=1mridomfiβ‰ βˆ…. Then for any x∈V

βˆ‘i=1mβˆ‚fi(x)=βˆ‚(βˆ‘i=1mfi)(x).

8.16.6.2. Linear TransformationsΒΆ

Our interest here is in compositions of the form h=f∘A where A is a linear transformation. In other words h(x)=f(A(x)).

If A:Vβ†’W is a linear transformation then AT:Wβˆ—β†’Vβˆ— is a mapping from Wβˆ— to Vβˆ— and satisfies the relationship:

⟨A(x),y⟩=⟨x,AT(y)⟩.

From the definition of directional derivative, we have

hβ€²(x;d)=limt↓0h(x+td)βˆ’h(x)t=limt↓0f(A(x+td))βˆ’f(A(x))t=limt↓0f(A(x)+tA(d))βˆ’f(A(x))t=fβ€²(A(x);A(d)).

Theorem 8.224 (Weak linear transformation rule of subdifferential calculus)

Let f:Wβ†’(βˆ’βˆž,∞] be a proper function. Let A:Vβ†’W be a linear transformation. Define h:Vβ†’(βˆ’βˆž,∞] as

h(x)=f(A(x)).

Assume that h is proper, i.e. domh is not empty:

domh={x∈V|A(x)∈domf}β‰ βˆ….

Then, for any x∈domh

AT(βˆ‚f(A(x)))βŠ†βˆ‚h(x).

Proof. We proceed as follows.

  1. Let x∈domh.

  2. Let gβˆˆβˆ‚f(A(x)).

  3. By Theorem 8.219,

    ⟨z,gβŸ©β‰€fβ€²(A(x);z)βˆ€z∈W.
  4. Choosing z=A(d), we have

    ⟨A(d),gβŸ©β‰€fβ€²(A(x);A(d))βˆ€d∈V.
  5. Equivalently

    ⟨d,AT(g)βŸ©β‰€hβ€²(x;d)βˆ€d∈V.
  6. Hence AT(g)βˆˆβˆ‚h(x) due to (8.8).

  7. Hence AT(βˆ‚f(A(x)))βŠ†βˆ‚h(x).

Theorem 8.225 (Strong linear transformation rule for subdifferential calculus)

Let f:Wβ†’(βˆ’βˆž,∞] be a proper convex function. Let A:Vβ†’W be a linear transformation. Define h:Vβ†’(βˆ’βˆž,∞] as

h(x)=f(A(x)).

Assume that h is proper, i.e. domh is not empty:

domh={x∈V|A(x)∈domf}β‰ βˆ….

Then, for any x∈intdomh such that A(x)∈intdomf, we have:

AT(βˆ‚f(A(x)))=βˆ‚h(x).

Proof. We showed AT(βˆ‚f(A(x)))βŠ†βˆ‚h(x) in Theorem 8.224. We show the reverse inclusion by contradiction.

  1. Let x∈intdomh such that A(x)∈intdomf.

  2. Assume that there exists dβˆˆβˆ‚h(x) such that dβˆ‰AT(βˆ‚f(A(x))).

  3. By Theorem 8.215, the set βˆ‚f(A(x)) is nonempty, convex and compact.

  4. Hence AT(βˆ‚f(A(x))) is also nonempty, convex and compact.

  5. By strict separation theorem (Theorem 8.169), there exists a vector p and a scalar c such that

    ⟨AT(g),p⟩<c<⟨d,pβŸ©βˆ€gβˆˆβˆ‚f(A(x)).
  6. Equivalently

    ⟨g,A(p)⟩<c<⟨d,pβŸ©βˆ€gβˆˆβˆ‚f(A(x)).
  7. Taking the supremum over βˆ‚f(A(x)) on the L.H.S., we obtain

    supgβˆˆβˆ‚f(A(x))⟨g,A(p)⟩<⟨d,p⟩.
  8. By the max formula

    fβ€²(A(x);A(p))<⟨d,p⟩.
  9. But this means that

    hβ€²(x;p)<⟨d,p⟩.
  10. This contradicts the assumption that dβˆˆβˆ‚h(x).

  11. Hence we must have

    AT(βˆ‚f(A(x)))=βˆ‚h(x).

8.16.6.3. Affine TransformationsΒΆ

Theorem 8.226 (Weak affine transformation rule of subdifferential calculus)

Let f:Wβ†’(βˆ’βˆž,∞] be a proper function. Let A:Vβ†’W be a linear transformation. Let b∈W. Define h:Vβ†’(βˆ’βˆž,∞] as

h(x)=f(A(x)+b).

Assume that h is proper, i.e. domh is not empty:

domh={x∈V|A(x)+b∈domf}β‰ βˆ….

Then, for any x∈domh

AT(βˆ‚f(A(x)+b))βŠ†βˆ‚h(x).

Proof. We proceed as follows.

  1. Let x∈domh.

  2. Then, xβ€²=A(x)+b∈domf such that h(x)=f(xβ€²).

  3. Let g∈AT(βˆ‚f(xβ€²)).

  4. Then, there is d∈Wβˆ— such that g=AT(d) with dβˆˆβˆ‚f(xβ€²).

  5. Let y∈domh.

  6. Then, yβ€²=A(y)+b∈domf such that h(y)=f(yβ€²).

  7. Applying subgradient inequality for f at xβ€² with the subgradient being d, we get

    f(yβ€²)β‰₯f(xβ€²)+⟨yβ€²βˆ’xβ€²,d⟩.
  8. We have h(y)=f(yβ€²), h(x)=f(xβ€²) and yβ€²βˆ’xβ€²=A(yβˆ’x).

  9. Thus, the subgradient inequality simplifies to

    h(y)β‰₯h(x)+⟨A(yβˆ’x),d⟩.
  10. We note that

    ⟨A(yβˆ’x),d⟩=⟨yβˆ’x,AT(d)⟩.
  11. Thus, for any y∈domh, we have

    h(y)β‰₯h(x)+⟨yβˆ’x,AT(d)⟩.
  12. Thus, g=AT(d)βˆˆβˆ‚h(x).

Since this is valid for any x∈domh and for every g∈AT(βˆ‚f(A(x)+b)), hence

AT(βˆ‚f(A(x)+b))βŠ†h(x).

Theorem 8.227 (Affine transformation rule of subdifferential calculus)

Let f:Wβ†’(βˆ’βˆž,∞] be a proper convex function. Let A:Vβ†’W be a linear transformation. Let b∈W. Define h:Vβ†’(βˆ’βˆž,∞] as

h(x)=f(A(x)+b).

Assume that h is proper, i.e. domh is not empty:

domh={x∈V|A(x)+b∈domf}β‰ βˆ….

Then, for any x∈intdomh such that A(x)+b∈intdomf, we have:

AT(βˆ‚f(A(x)+b))=βˆ‚h(x).

Proof. We note that h is a proper convex function since it is a composition of an affine transformation with a proper convex function.

  1. Let x∈intdomh such that xβ€²=A(x)+b∈intdomf.

  2. Then, for any direction d∈V, by the max formula,

    hβ€²(x;d)=Οƒβˆ‚h(x)(d).
  3. By the definition of the directional derivative, we have

    hβ€²(x;d)=limΞ±β†’0+h(x+Ξ±d)βˆ’h(x)Ξ±=limΞ±β†’0+f(A(x)+b+Ξ±A(d))βˆ’f(A(x)+bΞ±=fβ€²(A(x)+b;A(d)).
  4. Thus,

    Οƒβˆ‚h(x)(d)=fβ€²(A(x)+b;A(d)).
  5. Using the max formula on the R.H.S., we get

    Οƒβˆ‚h(x)(d)=fβ€²(A(x)+b;A(d))=supgβˆˆβˆ‚f(A(x)+b)⟨A(d),g⟩=supgβˆˆβˆ‚f(A(x)+b)⟨d,AT(g)⟩=supgβ€²βˆˆAT(βˆ‚f(A(x)+b))⟨d,gβ€²βŸ©=ΟƒAT(βˆ‚f(A(x)+b))(d).
  6. Since x∈intdomh, hence by Theorem 8.211 and Theorem 8.214 βˆ‚h(x) is nonempty, closed and convex.

  7. Since A(x)+b∈intdomf, hence by Theorem 8.211 and Theorem 8.214 βˆ‚f(A(x)+b) is nonempty, closed and convex.

  8. It follows that AT(βˆ‚f(A(x)+b)) is nonempty, closed and convex since AT is a linear operator and both V and W are finite dimensional.

  9. Then, due to Theorem 8.90,

    AT(βˆ‚f(A(x)+b))=βˆ‚h(x).

8.16.6.4. CompositionΒΆ

Chain rule is a key principle in computing derivatives of composition of functions. A chain rule is available for subgradient calculus also.

We first recall a result on the derivative of composition of real functions.

Theorem 8.228 (Chain rule for real functions)

Let f:Rβ†’R be a real function which is continuous on [a,b] with a<b. Assume that f+β€²(a) exists. Let g:Rβ†’R be another real function defined on an open interval I such that rangefβŠ†I. Assume g is differentiable at f(a). Then the composite real function h:Rβ†’R given by

h(t)β‰œg(f(t))(a≀t≀b)

is right differentiable at t=a. In particular,

h+β€²(a)=gβ€²(f(a))f+β€²(a).

Proof. We show this by working with the definition of right hand derivative as a limit

h+β€²(a)=limtβ†’a+h(t)βˆ’h(a)tβˆ’a=limtβ†’a+g(f(t))βˆ’g(f(a))tβˆ’a=limtβ†’a+g(f(t))βˆ’g(f(a))f(t)βˆ’f(a)f(t)βˆ’f(a)tβˆ’a=limzβ†’f(a)g(z)βˆ’g(f(a))zβˆ’f(a)limtβ†’a+f(t)βˆ’f(a)tβˆ’a=gβ€²(f(a))f+β€²(a).

We can now develop a chain rule for subdifferentials of multidimensional functions with the help of max formula.

Theorem 8.229 (Subdifferential chain rule)

Let f:Vβ†’R be convex and let g:Rβ†’R be a nondecreasing convex function. Let x∈V. Assume that g is differentiable at f(x). Let h=g∘f. Then

βˆ‚h(x)=gβ€²(f(x))βˆ‚f(x).

Proof. We are given x∈V at which g is differentiable and f is convex.

  1. Since f is convex and g is nondecreasing convex function, hence h is also convex.

  2. We now introduce two real functions parametrized on x and an arbitrary direction d∈V

    fx,d(t)=f(x+td),t∈Rhx,d(t)=h(x+td),t∈R
  3. It is now easy to see that

    hx,d(t)=h(x+td)=g(f(x+td))=g(fx,d(t)).
  4. Thus, hx,d=g∘fx,d.

  5. Since fx,d and hx,d are restrictions of f and h along a line, they are also convex.

  6. Due to Theorem 8.204, the directional derivatives of f and h exist in every direction.

  7. By the definition of directional derivative Definition 8.70,

    (fx,d)+β€²(0)=fβ€²(x;d),(hx,d)+β€²(0)=hβ€²(x;d).
  8. Also note that fx,d(0)=f(x) and hx,d(0)=h(x).

  9. fx,d is right differentiable at t=0, and g is differentiable at f(x).

  10. Hence, by the chain rule in Theorem 8.228,

    hβ€²(x;d)=gβ€²(f(x))fβ€²(x;d).
  11. By the max formula in Corollary 8.21,

    hβ€²(x;d)=Οƒβˆ‚h(x)(d)fβ€²(x;d)=Οƒβˆ‚f(x)(d).
  12. Thus,

    Οƒβˆ‚h(x)(d)=hβ€²(x;d)=gβ€²(f(x))fβ€²(x;d)=gβ€²(f(x))Οƒβˆ‚f(x)(d)=Οƒgβ€²(f(x))βˆ‚f(x)(d).

    The last step is due to Theorem 8.92. Since g is nondecreasing, hence gβ€²(f(x))β‰₯0.

  13. By Theorem 8.211 and Theorem 8.214, the sets βˆ‚f(x) and βˆ‚h(x) are nonempty, closed and convex.

  14. Then, the set gβ€²(f(x))βˆ‚f(x) is also nonempty, closed and convex.

  15. Thus, by Theorem 8.90,

    βˆ‚h(x)=gβ€²(f(x))βˆ‚f(x).

Applications of this rule are presented later in Example 8.70.

8.16.6.5. Max RuleΒΆ

Theorem 8.230 (Max rule of subdifferential calculus)

Let f1,f2,…,fm:Vβ†’(βˆ’βˆž,∞] be a set of proper convex functions. Let f:Vβ†’(βˆ’βˆž,∞] be given by:

f(x)=max{f1(x),f2(x),…,fm(x)}.

Let xβˆˆβ‹‚i=1mintdomfi be a point common to the interiors of domains of all the functions.

The subdifferential set of f at x can be obtained from the subdifferentials of fi as follows:

βˆ‚f(x)=conv(⋃i∈I(x)βˆ‚fi(x))

where I(x)={i|fi(x)=f(x)}.

Proof. Since fi are proper convex, hence their pointwise maximum f is proper convex.

  1. Let I(x)={i∈1,…,m|fi(x)=f(x)}.

  2. For any (nonzero) direction, d∈V, by Theorem 8.209:

    fβ€²(x;d)=maxi∈I(x)fiβ€²(x;d).
  3. Without loss of generality, let us assume that I(x)=1,…,k for some k∈1,…,m. This can be achieved by reordering fi.

  4. By max formula (Theorem 8.219),

    fiβ€²(x;d)=sup{⟨d,g⟩|gβˆˆβˆ‚fi(x)}.
  5. Thus,

    fβ€²(x;d)=maxi∈1,…,ksupgiβˆˆβˆ‚fi(x)⟨d,gi⟩.
  6. Recall that for any a1,…,ak∈R, the identity below holds:

    max{a1,…,ak}=suptβˆˆΞ”kβˆ‘i=1ktiai.
  7. Thus, we can expand fβ€²(x;d) as:

    fβ€²(x;d)=maxi∈1,…,ksupgiβˆˆβˆ‚fi(x)⟨d,gi⟩=suptβˆˆΞ”k{βˆ‘i=1ktisupgiβˆˆβˆ‚fi(x)⟨d,gi⟩}=suptβˆˆΞ”k{βˆ‘i=1ksup⟨d,tigi⟩|giβˆˆβˆ‚fi(x)}=sup{⟨d,βˆ‘i=1ktigi⟩|giβˆˆβˆ‚fi(x),tβˆˆΞ”k}=sup{⟨d,g⟩|g∈conv(⋃i=1kβˆ‚fi(x))}=ΟƒA(d)

    where A=conv(⋃i=1kβˆ‚fi(x)) and Οƒ denotes the support function.

  8. Since x∈intdomf, hence, by the max formula (Corollary 8.21)

    fβ€²(x;d)=Οƒβˆ‚f(x)(d).
  9. Thus, we have

    Οƒβˆ‚f(x)(d)=ΟƒAd.
  10. By Theorem 8.211, βˆ‚f(x) is closed and convex.

  11. By Theorem 8.214, βˆ‚f(x) is nonempty and bounded.

  12. Thus, βˆ‚f(x) is nonempty, closed and convex.

  13. Similarly, βˆ‚fi(x) are nonempty, closed, convex and bounded.

  14. Thus, ⋃i=1kβˆ‚fi(x) is a finite union of nonempty, closed, convex and bounded sets.

  15. Thus, ⋃i=1kβˆ‚fi(x) is also nonempty and compact.

    1. A finite union of nonempty sets is nonempty.

    2. A finite union of bounded sets is bounded.

    3. A finite union of closed sets is closed.

    4. Thus, ⋃i=1kβˆ‚fi(x) is closed and bounded.

    5. Since V is finite dimensional, hence closed and bounded sets are compact.

  16. Since A is a convex hull of ⋃i=1kβˆ‚fi(x), hence A is nonempty, closed and convex.

    1. Recall from Theorem 8.129 that convex hull of a compact set is compact.

    2. Also, recall that compact sets are closed and bounded.

  17. Since Οƒβˆ‚f(x)(d)=ΟƒA(d) is true for any d∈V, the support functions for the underlying nonempty, closed and convex set are equal. Hence by Theorem 8.90,

    βˆ‚f(x)=A=conv(⋃i=1kβˆ‚fi(x)).

Some applications of this rule are presented later in Example 8.75, Example 8.76, Example 8.78, Example 8.79.

We now present a weaker version of the max rule which is applicable for pointwise supremum over an arbitrary set of functions.

Theorem 8.231 (Weak max rule of subdifferential calculus)

Let I be an arbitrary index set and suppose that for every i∈I, there exists a proper convex function fi:Vβ†’(βˆ’βˆž,∞]. Let f:Vβ†’(βˆ’βˆž,∞] be given by:

f(x)=supi∈I{fi(x)}.

Then for any x∈domf,

conv(⋃i∈I(x)βˆ‚fi(x))βŠ†βˆ‚f(x)

where I(x)={i∈I|fi(x)=f(x)}.

In words, if fi(x)=f(x), then a subgradient of fi at x is also a subgradient of f at x. Also, for all i∈I such that fi(x)=f(x), any convex combination of their subgradients at x is also a subgradient of f at x.

Proof. Pick some x∈domf.

  1. Let z∈domf be arbitrary.

  2. Let I(x)={i∈I|fi(x)=f(x)}.

  3. Let i∈I(x) be arbitrary.

  4. Let gβˆˆβˆ‚fi(x) be a subgradient of fi at x.

  5. Then, by definition of f and subgradient inequality:

    f(z)β‰₯fi(z)β‰₯fi(x)+⟨zβˆ’x,g⟩=f(x)+⟨zβˆ’x,g⟩.

    We used the fact that fi(x)=f(x) for i∈I(x).

  6. Thus, gβˆˆβˆ‚f(x). g is a subgradient of f at x.

  7. Since this is valid for every subgradient of fi at x, hence βˆ‚fi(x)βŠ†βˆ‚f(x).

  8. Since this is valid for every i∈I(x), hence

    ⋃i∈I(x)βˆ‚fi(x)βŠ†βˆ‚f(x).
  9. Recall from Theorem 8.211 that βˆ‚f(x) is convex.

  10. Thus, it contains the convex hull of any of its subsets. Hence,

    conv(⋃i∈I(x)βˆ‚fi(x))βŠ†βˆ‚f(x).

Next is an example application of the weak max rule.

Example 8.66 (Subgradient of Ξ»max(A0+βˆ‘i=1mxiAi)

Let A0,A1,…,Am∈Sn be m+1 given symmetric matrices. Define an affine transformation A:Rmβ†’Sn as

A(x)β‰œA0+βˆ‘i=1mxiAi.

For every vector x∈Rm, this mapping defines a symmetric matrix A(x). We can compute the largest eigen value of A(x). We introduce a function f:Rmβ†’R as

f(x)β‰œΞ»max(A(x))=Ξ»max(A0+βˆ‘i=1mxiAi).

Our task is to find a subgradient of f at x.

  1. Recall from the definition of largest eigen values,

    f(x)=supy∈Rn;β€–yβ€–2=1yTA(x)y.
  2. For every y∈Rn such that β€–yβ€–2=1, we can define a function:

    fy(x)β‰œyTA(x)y.
  3. Then,

    f(x)=supy∈Rn;β€–yβ€–2=1fy(x).
  4. The function fy(x) is affine (in x) for every y.

  5. Thus, fy is convex for every y.

  6. Thus, f is a pointwise supremum of a family of functions fy.

  7. Thus, f is also convex (see Theorem 8.114).

  8. Consequently, we can use the weak max rule Theorem 8.231 to identify a subgradient of f at x.

  9. Let y~ be a normalized eigenvector of A(x) corresponding to its largest eigenvalue. Then

    f(x)=y~TA(x)y~.
  10. This means that f(x)=fy~(x).

  11. By the weak max rule, a subgradient of fy~ at x is also a subgradient of f at x.

  12. Expanding fy~(x):

    fy~(x)=y~TA(x)y~=y~TA0y~+βˆ‘i=1my~TAiy~xi.
  13. Then, the gradient of fy~ at x (computed by taking partial derivatives w.r.t. xi) is

    βˆ‡fy~(x)=(y~TA1y~,…,y~TAmy~).
  14. Since fy is affine (thus convex), hence its gradient is also a subgradient.

  15. Thus,

    (y~TA1y~,…,y~TAmy~)βˆˆβˆ‚f(x).

8.16.7. Lipschitz ContinuityΒΆ

Theorem 8.232 (Lipschitz continuity and boundedness of the subdifferential sets)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper convex function. Suppose that XβŠ†intdomf. Consider the following two claims:

  1. |f(x)βˆ’f(y)|≀Lβ€–xβˆ’yβ€– for any x,y∈X.

  2. β€–gβ€–βˆ—β‰€L for any gβˆˆβˆ‚f(x) where x∈X.

Then,

  • (2) implies (1). In other words, if subgradients are bounded then, the function is Lipschitz continuous.

  • If X is open, then (1) holds if and only if (2) holds.

In other words, if the subgradients over a set X are bounded then f is Lipschitz continuous over X. If X is open then f is Lipschitz continuous over X if and only if the subgradients over X are bounded.

Proof. (a) We first show that (2)⟹(1).

  1. Assume that (2) is satisfied.

  2. Pick any x,y∈X.

  3. Since f is proper and convex and x,y∈intdomf, hence due to Theorem 8.214, βˆ‚f(x) and βˆ‚f(y) are nonempty.

  4. Let gxβˆˆβˆ‚f(x) and gyβˆˆβˆ‚f(y).

  5. By subgradient inequality

    f(y)β‰₯f(x)+⟨yβˆ’x,gx⟩;f(x)β‰₯f(y)+⟨xβˆ’y,gy⟩.
  6. We can rewrite this as

    f(x)βˆ’f(y)β‰€βŸ¨xβˆ’y,gx⟩;f(y)βˆ’f(x)β‰€βŸ¨yβˆ’x,gy⟩.
  7. By generalized Cauchy Schwartz inequality (Theorem 4.110),

    ⟨xβˆ’y,gxβŸ©β‰€β€–xβˆ’yβ€–β€–gxβ€–βˆ—β‰€Lβ€–xβˆ’yβ€–;⟨yβˆ’x,gyβŸ©β‰€β€–yβˆ’xβ€–β€–gyβ€–βˆ—β‰€Lβ€–xβˆ’yβ€–.
  8. Combining the two inequalities, we get

    |f(x)βˆ’f(y)|≀Lβ€–xβˆ’yβ€–.
  9. Thus, (2)⟹(1).

(b) If X is open, then we need to show that (1)⟺(2).

  1. We have already shown that (2)⟹(1).

  2. Assume that X is open and (1) holds.

  3. Let x∈X.

  4. Since x is an interior point of domf, hence the subdifferential is nonempty.

  5. Pick any gβˆˆβˆ‚f(x).

  6. Let gβ€ βˆˆV be a vector with β€–g†‖=1 and ⟨g†,g⟩=β€–gβ€–βˆ—. Such a vector exists by definition of the dual norm.

  7. Since X is open, we can choose Ο΅>0 small enough such that x+Ο΅gβ€ βˆˆX.

  8. By the subgradient inequality, we have:

    f(x+Ο΅g†)β‰₯f(x)+⟨ϡg†,g⟩.
  9. Thus,

    Ο΅β€–gβ€–βˆ—=⟨ϡg†,gβŸ©β‰€f(x+Ο΅g†)βˆ’f(x)≀Lβ€–(x+Ο΅gβ€ βˆ’xβ€– by hypothesis in (1)=LΟ΅β€–g†‖=LΟ΅.
  10. Canceling Ο΅, we get:

    β€–gβ€–βˆ—β‰€L

    holds true for every gβˆˆβˆ‚f(x) where x∈X as desired.

Corollary 8.25 (Lipschitz continuity of convex functions over compact domains)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper and convex function. Suppose that XβŠ†intdomf is compact. Then, there exists L>0 such that

|f(x)βˆ’f(y)|≀Lβ€–xβˆ’yβ€–βˆ€x,y∈X.

Proof. Recall from Theorem 8.216 that the subgradients of a proper convex function over a compact set are nonempty and bounded.

  1. In other words, the set

    Y=⋃x∈Xβˆ‚f(x)

    is nonempty and bounded.

  2. Thus, for every g∈Y, there exists L>0 such that β€–gβ€–βˆ—β‰€L.

  3. Then by Theorem 8.232,

    |f(x)βˆ’f(x)|≀Lβ€–xβˆ’yβ€–βˆ€x,y∈X.
  4. Thus, f is indeed Lipschitz continuous over X.

8.16.8. Ο΅-SubgradientsΒΆ

Definition 8.76 (Ο΅-Subgradient)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper function. Let x∈domf. A vector g∈Vβˆ— is called an Ο΅-subgradient of f at x if

(8.9)ΒΆf(y)β‰₯f(x)+⟨yβˆ’x,gβŸ©βˆ’Ο΅βˆ€y∈V.

8.16.8.1. Geometric InterpretationΒΆ

Observation 8.12 (Ο΅-subgradient and supporting hyperplane)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper function. Then g be an Ο΅-subgradient of f at x if and only if epif is contained in the positive halfspace of the hyperplane with a normal (βˆ’g,1) passing through (x,f(x)βˆ’Ο΅).

Proof. Let H denote the hyperplane

H={(y,t)|⟨y,βˆ’g⟩+t=⟨x,βˆ’g⟩+f(x)βˆ’Ο΅}.

The positive halfspace of H is given by

H+={(y,t)|⟨y,βˆ’g⟩+tβ‰₯⟨x,βˆ’g⟩+f(x)βˆ’Ο΅}.

Assume that g is an Ο΅-subgradient of f at x.

  1. For any (y,t)∈epif, we have

    tβ‰₯f(y)β‰₯f(x)+⟨yβˆ’x,gβŸ©βˆ’Ο΅.
  2. This is equivalent to

    ⟨y,βˆ’g⟩+tβ‰₯⟨x,βˆ’g⟩+f(x)βˆ’Ο΅

    for all (y,t)∈epif.

  3. Hence epifβŠ†H+.

Now assume that epifβŠ†H+.

  1. Let (y,f(y))∈epif.

  2. Then we have

    ⟨y,βˆ’g⟩+f(y)β‰₯⟨x,βˆ’g⟩+f(x)βˆ’Ο΅.
  3. Rearranging the terms, we have

    f(y)β‰₯f(x)+⟨yβˆ’x,gβŸ©βˆ’Ο΅βˆ€y∈V.
  4. But this means that g is an Ο΅-subgradient of f at x.

8.16.8.2. Ο΅-SubdifferentialΒΆ

Definition 8.77 (Ο΅-subdifferential)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper function. The set of all Ο΅-subgradients of f at a point x∈domf is called the Ο΅-subdifferential of f at x and is denoted by βˆ‚Ο΅f(x).

βˆ‚Ο΅f(x)β‰œ{g∈Vβˆ—|f(y)β‰₯f(x)+⟨yβˆ’x,gβŸ©βˆ’Ο΅βˆ€y∈V}.

For all xβˆ‰domf, we define βˆ‚Ο΅f(x)=βˆ….

It is easy to see that

βˆ‚f(x)βŠ†βˆ‚Ο΅f(x).

Also, if Ο΅2β‰₯Ο΅1>0, then

βˆ‚Ο΅1f(x)βŠ†βˆ‚Ο΅2f(x).

8.16.9. Optimality ConditionsΒΆ

A well known result for differentiable functions is that at the point of optimality βˆ‡f(x)=0 (see Theorem 7.1). Subdifferentials are useful in characterizing the minima of a function. The idea of vanishing gradients can be generalized for subgradients also.

Theorem 8.233 (Fermat’s optimality condition)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper convex function. Then

a∈argmin{f(x)|x∈V}

if and only if 0βˆˆβˆ‚f(a).

In other words, a is a minimizer of f if and only if 0 is a subgradient of f at a.

Proof. Assume that 0βˆˆβˆ‚f(a) where a∈domf.

  1. By subgradient inequality

    f(x)β‰₯f(a)+⟨xβˆ’a,0βŸ©βˆ€x∈V.
  2. This simplifies to

    f(x)β‰₯f(a)βˆ€x∈V.
  3. Thus, a∈argmin{f(x)|x∈V}.

For the converse, assume that a∈argmin{f(x)|x∈V}.

  1. Then,

    f(x)β‰₯f(a)βˆ€x∈V.
  2. But then

    f(x)β‰₯f(a)⟺f(x)β‰₯f(a)+0⟺f(x)β‰₯f(a)+⟨xβˆ’a,0⟩

    holds true for every x∈V.

  3. This implies that 0βˆˆβˆ‚f(a).

8.16.10. Mean Value TheoremΒΆ

The following result is from [25].

Theorem 8.234 (A subgradients based mean value theorem for 1D functions)

Let f:Rβ†’(βˆ’βˆž,∞] be a proper closed convex function. Let [a,b]βŠ†domf with a<b. Then,

f(b)βˆ’f(a)=∫abh(t)dt

where h:(a,b)β†’R satisfies h(t)βˆˆβˆ‚f(t) for every t∈(a,b).

In the reminder of this section, we compute the subgradients and subdifferential sets for a variety of standard functions.

8.16.11. Norm FunctionsΒΆ

We recall from Theorem 8.210 that the subdifferential of a norm ‖⋅‖:V→R at x=0 is given by:

βˆ‚f(0)=Bβ€–β‹…β€–βˆ—[0,1]={g∈Vβˆ—|β€–gβ€–βˆ—β‰€1}.

8.16.11.1. β„“1-NormΒΆ

Example 8.67 (Subdifferential of β„“1 norm at origin)

Let f:Rnβ†’R be given by f(x)=β€–xβ€–1. We recall that the dual norm of β„“1 is β„“βˆž. The unit ball of β„“βˆž-norm at origin is given by

Bβ€–β‹…β€–βˆž[0,1]=[βˆ’1,1]n.

Following Theorem 8.210, the subdifferential of f at x=0 is given by:

βˆ‚f(0)=Bβ€–β‹…β€–βˆž[0,1]=[βˆ’1,1]n.

Example 8.68 (Subdifferential of absolute value function at origin)

Let g:R→R be the absolute value function given by

g(x)=|x|.

This is a special case of β„“1 norm for R1. Thus, following Example 8.67, the subdifferential of g at x=0 is given by:

βˆ‚g(0)=[βˆ’1,1].

For a complete specification of the subdifferential of g, see Example 8.73 below.

Example 8.69 (Subdifferential of β„“1 norm )

Let f:Rn→R be given by f(x)=‖x‖1. We can write f as a sum of n functions

f(x)=β€–xβ€–1=βˆ‘i=1n|xi|=βˆ‘i=1nfi(x)

where

fi(x)=|xi|.

Let g(x)=|x|. Then

fi(x)=|xi|=|eiTx|=g(eiTx).

Due to affine transformation rule (Theorem 8.227),

βˆ‚fi(x)=(βˆ‚g(eiTx))ei=(βˆ‚g(xi))ei.

The subdifferential of the absolute value function g is described in Example 8.73 below.

Thus, the subdifferential set of fi is given by:

βˆ‚fi(x)={{sgn(xi)ei}forxiβ‰ 0[βˆ’ei,ei]forxi=0.

Using the sum rule Corollary 8.24, we have:

βˆ‘i=1nβˆ‚fi(x)=βˆ‚(βˆ‘i=1nfi)(x).

We define the index set:

I0(x)={i|xi=0}.

Expanding the sum of subdifferentials,

βˆ‚f(x)=βˆ‘i∈I0(x)[βˆ’ei,ei]+βˆ‘iβˆ‰I0(x)sgn(xi)ei.

We can rewrite this as:

βˆ‚f(x)={z∈Rn|zi=sgn(xi) whenever xiβ‰ 0,|zi|≀1, otherwise }.

We also have a weak result from this:

sgn(x)βˆˆβˆ‚f(x)=βˆ‚β€–xβ€–1.

Example 8.70 (Subdifferential of β„“1 norm squared)

Let f:Rn→R be given by f(x)=‖x‖1.

Now let g(t)=[t]+2. And consider the function h=g∘f given by

h(x)=β€–xβ€–12.

By subdifferential chain rule (Theorem 8.229):

βˆ‚h(x)=2β€–xβ€–1βˆ‚f(x)=2β€–xβ€–1{z∈Rn|zi=sgn(xi) whenever xiβ‰ 0,|zi|≀1, otherwise }.

We have used the subdifferential of f from Example 8.69.

8.16.11.2. β„“2-NormΒΆ

Example 8.71 (Subdifferential of β„“2 norm at origin)

Let f:Rn→R be given by f(x)=‖x‖2. We recall that the dual norm of ℓ1 is also ℓ2 as this norm is self dual.

Following Theorem 8.210, the subdifferential of f at x=0 is given by:

βˆ‚f(0)=Bβ€–β‹…β€–2[0,1]={g∈Rn|β€–gβ€–2≀1}.

Example 8.72 (Subdifferential of β„“2 norm)

Let f:Rn→R be given by f(x)=‖x‖2. At x≠0, f is differentiable with the gradient (see Example 5.10):

βˆ‡f(x)=xβ€–xβ€–2.

Since f is convex and differentiable at x≠0, hence due to Theorem 8.220,

βˆ‚f(x)={βˆ‡f(x)}={xβ€–xβ€–2}.

Combining this with the subdifferential of f at origin from Example 8.71, we obtain:

βˆ‚f(x)={{xβ€–xβ€–2}forxβ‰ 0Bβ€–β‹…β€–2[0,1]forx=0.

Example 8.73 (Subdifferential of absolute value function)

Let g:R→R be the absolute value function given by

g(x)=|x|.

This is a special case of β„“2 norm for R1. Following Example 8.72,

βˆ‚g(x)={{sgn(x)}forxβ‰ 0[βˆ’1,1]forx=0.

c.f. Example 8.68.

8.16.11.3. β„“βˆž-NormΒΆ

Example 8.74 (Subdifferential of β„“βˆž norm at origin)

Let f:Rnβ†’R be given by f(x)=β€–xβ€–βˆž. We recall that the dual norm of β„“βˆž is β„“1. The unit ball of β„“1-norm at origin is given by

Bβ€–β‹…β€–1[0,1]={x∈Rn|β€–xβ€–1≀1}.

Following Theorem 8.210, the subdifferential of f at x=0 is given by:

βˆ‚f(0)=Bβ€–β‹…β€–1[0,1]={x∈Rn|β€–xβ€–1≀1}.

Example 8.75 (Subdifferential of β„“βˆž norm)

Let f:Rnβ†’R be given by f(x)=β€–xβ€–βˆž. Let us compute the subdifferential of f at xβ‰ 0.

We have:

f(x)=max{f1(x),f2(x),…,fn(x)}

where fi(x)=|xi|. We define:

I(x)={i∈[1,…,n]||xi|=f(x)=β€–xβ€–βˆž}.

Then, following Example 8.69

βˆ‚fi(x)={sgn(xi)ei}βˆ€i∈I(x).

This is valid since xβ‰ 0 implies that f(x)β‰ 0 which in turn implies that xiβ‰ 0 for every i∈I(x).

Then, using the max rule for proper convex functions (Theorem 8.230):

βˆ‚f(x)=conv(⋃i∈I(x){sgn(xi)ei}).

We can rewrite this as:

βˆ‚f(x)={βˆ‘i∈I(x)Ξ»isgn(xi)ei|βˆ‘i∈I(x)Ξ»i=1,Ξ»jβ‰₯0,j∈I(x)}.

Combining this with the subdifferential of f at origin from Example 8.74, we obtain:

βˆ‚f(x)={{βˆ‘i∈I(x)Ξ»isgn(xi)ei|βˆ‘i∈I(x)Ξ»i=1,Ξ»jβ‰₯0,j∈I(x)},xβ‰ 0Bβ€–β‹…β€–1[0,1],x=0.

8.16.11.4. β„“1 Norm over Affine TransformationΒΆ

Let A∈RmΓ—n. Let b∈Rm. Let f:Rmβ†’R be given by f(y)=β€–yβ€–1.

Let h:Rn→R be the function h(x)=‖Ax+b‖1=f(Ax+b).

By affine transformation rule, we have:

βˆ‚h(x)=ATβˆ‚f(Ax+b)βˆ€x∈Rn.

Denoting i-th row of A as aiT, we define the index set:

I0(x)={i:aiTxi+bi=0}.

we have:

βˆ‚h(x)=βˆ‘i∈I0(x)[βˆ’ai,ai]+βˆ‘iβˆ‰I0(x)sgn(aiTx+bi)ai.

In particular, we have the weak result:

ATsgn(Ax+b)βˆˆβˆ‚h(x).

8.16.11.5. β„“2 Norm over Affine TransformationΒΆ

Let A∈RmΓ—n. Let b∈Rm. Let f:Rmβ†’R be given by f(y)=β€–yβ€–2.

Let h:Rn→R be the function h(x)=‖Ax+b‖2=f(Ax+b).

We have:

βˆ‚f(Ax+b)={{Ax+bβ€–Ax+bβ€–2}forAx+bβ‰ 0Bβ€–β‹…β€–2[0,1]forAx+b=0.

Applying the affine transformation rule, we get:

βˆ‚h(x)=ATβˆ‚f(Ax+b)={{AT(Ax+b)β€–Ax+bβ€–2}forAx+bβ‰ 0ATBβ€–β‹…β€–2[0,1]forAx+b=0.

For x|Ax+b=0, we can write this as

βˆ‚h(x)=ATBβ€–β‹…β€–2[0,1]={ATy|β€–yβ€–2≀1}.

8.16.11.6. β„“βˆž Norm over Affine TransformationΒΆ

Example 8.76 (Subdifferential of |Ax+b|∞)

Let A∈RmΓ—n. Let b∈Rm. Let f:Rmβ†’R be given by

f(y)=β€–yβ€–βˆž.

Let h:Rn→R be the function

h(x)=β€–Ax+bβ€–βˆž=f(Ax+b).

With y=Ax+b, we have yi=aiTx+bi where aiT is the i-th row vector of A.

Following Example 8.75

βˆ‚f(y)={{βˆ‘i∈I(y)Ξ»isgn(yi)ei|βˆ‘i∈I(y)Ξ»i=1,Ξ»jβ‰₯0,j∈I(y)},yβ‰ 0Bβ€–β‹…β€–1[0,1],y=0

where I(y)={i∈[1,…,n]||yi|=f(y)=β€–yβ€–βˆž}.

Due to affine transformation rule (Theorem 8.227),

βˆ‚h(x)=ATβˆ‚f(Ax+b).

We have the following cases.

(a) y=0.

  1. In terms of x, the condition y=0 is equivalent to Ax+b=0.

  2. Then,

    βˆ‚f(Ax+b)=βˆ‚f(0)=Bβ€–β‹…β€–1[0,1].
  3. Thus,

    βˆ‚h(x)=ATBβ€–β‹…β€–1[0,1].

(b) y≠0.

  1. In terms of x, the condition y≠0 is equivalent to Ax+b≠0.

  2. Then,

    βˆ‚f(Ax+b)={βˆ‘i∈I(y)Ξ»isgn(yi)ei|βˆ‘i∈I(y)Ξ»i=1,Ξ»jβ‰₯0,j∈I(y)}={βˆ‘i∈IxΞ»isgn(aiTx+bi)ei|βˆ‘i∈IxΞ»i=1,Ξ»jβ‰₯0,j∈Ix}

    where

    Ix=I(y)=I(Ax+b).
  3. Note that ATei=ai.

  4. Then,

    βˆ‚h(x)=ATβˆ‚f(Ax+b)={βˆ‘i∈IxΞ»isgn(aiTx+bi)ai|βˆ‘i∈IxΞ»i=1,Ξ»jβ‰₯0,j∈Ix}.

Combining the two cases, we get:

βˆ‚h(x)={{βˆ‘i∈IxΞ»isgn(aiTx+bi)ai|βˆ‘i∈IxΞ»i=1,Ξ»jβ‰₯0,j∈Ix},Ax+bβ‰ 0ATBβ€–β‹…β€–1[0,1],Ax+b=0

8.16.12. Indicator FunctionsΒΆ

Theorem 8.235 (Subdifferential of indicator function)

The subdifferential of indicator function for a nonempty set SβŠ‚V at any point x∈S is given by

βˆ‚IS(x)=NS(x).

where NS(x) is the normal cone of S at x.

Proof. Let x∈S and gβˆˆβˆ‚IS(x). The subgradient inequality (8.7) gives us:

IS(z)β‰₯IS(x)+⟨zβˆ’x,gβŸ©βˆ€z∈S⟺0β‰₯0+⟨zβˆ’x,gβŸ©βˆ€z∈S⟺⟨zβˆ’x,gβŸ©β‰€0βˆ€z∈S⟺g∈NS(x).

Example 8.77 (Subdifferential of the indicator function of the unit ball)

The unit ball at origin is given by:

S=B[0,1]={x∈V|β€–x‖≀1}.

From Theorem 8.64, the normal cone of S at x∈S is given by:

NS(x)={y∈Vβˆ—|β€–yβ€–βˆ—β‰€βŸ¨x,y⟩}.

For any xβˆ‰S, NS(x)=βˆ…. Combining:

βˆ‚Ξ΄B[0,1](x)={{y∈Vβˆ—|β€–yβ€–βˆ—β‰€βŸ¨x,y⟩}forβ€–x‖≀1βˆ…forβ€–xβ€–>1.

8.16.13. Maximum Eigen Value FunctionΒΆ

The maximum eigen value function for symmetric matrices, denoted as f:Sn→R, is given by:

f(X)β‰œΞ»max(X).

Theorem 8.236 (Subgradient for maximum eigen value function)

Let f:Sn→R be the maximum eigen value function. Then

vvTβˆˆβˆ‚f(X)

where v is a normalized eigen-vector of X∈Sn associated with its maximum eigen value.

Proof. Let X∈Sn be given. Let v be a normalized eigen vector associated with the largest eigen value of X. Then, β€–vβ€–2=1.

For any Y∈Sn, we have:

f(Y)=Ξ»max(Y)=maxβ€–uβ€–2=1{uTYu}β‰₯vTYv=vTXv+vT(Yβˆ’X)v=Ξ»max(X)β€–vβ€–22+tr(vT(Yβˆ’X)v)=Ξ»max(X)+tr((Yβˆ’X)vvT)=f(X)+⟨Yβˆ’X,vvT⟩.

In this derivation, we have used the following results:

  • The maximum eigen value can be obtained by maximizing uTYu over the unit sphere.

  • For a scalar x∈R, x=tr(x).

  • tr(AB)=tr(BA) if both AB and BA are well defined.

  • ⟨A,B⟩=tr(AB) for the space of symmetric matrices.

Thus, vvTβˆˆβˆ‚f(X).

We note here that this result only identifies one of the subgradients of f at X. It doesn’t characterize the entire subdifferential of f at X. In this sense, this result is a weak result. In contrast, a strong result would characterize the entire subdifferential.

8.16.14. The Max FunctionΒΆ

Example 8.78 (Subdifferential of the max function)

Let f:Rn→R be given by:

f(x)=max{x1,x2,…,xn}.

Let fi(x)=xi=eiTx. Then

f(x)=max{f1(x),f2(x),…,fn(x)}.

We note that fi are differentiable and their gradient is given by (see Example 5.4):

βˆ‡fi(x)=ei.

Also, fi are linear, hence convex. Thus, due to Theorem 8.220:

βˆ‚fi(x)={ei}.

We denote the index set of functions which equal the value of f(x at x by:

I(x)={i|f(x)=xi}.

Then, using the max rule for proper convex functions (Theorem 8.230):

βˆ‚f(x)=conv(⋃i∈I(x)βˆ‚fi(x))=conv(⋃i∈I(x){ei}).

As and example, consider the case where x=α1 for some α∈R.

  1. In other words, x=(Ξ±,…,Ξ±).

  2. Then, f(x)=Ξ±.

  3. fi(x)=Ξ±=f(x) for ever i∈[1,…,n].

  4. I(x)={1,…,n}.

  5. βˆ‡fi(x)=ei.

  6. conv(⋃i∈I(x){ei})=conv{e1,…,en}.

  7. But conv{e1,…,en}=Ξ”n.

  8. Thus,

    βˆ‚f(Ξ±1)=Ξ”nβˆ€Ξ±βˆˆR.

8.16.15. Space of MatricesΒΆ

Let V=RmΓ—n. Let the standard inner product for x,y,∈V be ⟨x,y⟩=tr(xTy).

Let f:Vβ†’(βˆ’βˆž,∞] be a proper function. Let x∈intdomf.

The gradient at x, if it exists, is given by:

βˆ‡f(x)=Df(x)β‰œ(βˆ‚fβˆ‚xij(x))i,j.

Let H be a positive definite matrix and define an inner product for V as:

⟨x,y⟩Hβ‰œtr(xTHy).

Then

βˆ‡f(x)=Hβˆ’1Df(x).

8.16.16. Convex Piecewise Linear FunctionsΒΆ

Example 8.79 (Subdifferential of convex piecewise linear functions)

Let a convex piecewise linear function f:Rn→R be given by:

f(x)=max1≀i≀m{aiTx+bi}

where ai∈Rn,bi∈R for i=1,…,m.

We define a set of functions fi:Rnβ†’R for i=1,…,m as

fi(x)=aiTx+bi

We can see that f is a pointwise maximum of these functions.

f(x)=max1≀i≀m{fi(x)}.

Clearly,

βˆ‚fi(x)={βˆ‡fi(x)}={ai}.

We define:

I(x)={i∈[1,…,m]|f(x)=fi(x)=aiTx+bi}.

Then, using the max rule for proper convex functions (Theorem 8.230):

βˆ‚f(x)=conv(⋃i∈I(x)βˆ‚fi(x))={βˆ‘i∈I(x)Ξ»iai|βˆ‘i∈I(x)Ξ»i=1,Ξ»jβ‰₯0βˆ€j∈I(x)}.

By Fermat’s optimality condition (Theorem 8.233), xβˆ— is a minimizer of f if and only if 0∈f(xβˆ—).

Thus, xβˆ— is a minimizer if and only if there exists Ξ»Ξ»βˆˆΞ”m such that

0=βˆ‘i=1mΞ»iai,Ξ»j=0βˆ€jβˆ‰I(xβˆ—).

Note that at any x, for every jβˆ‰I(x), we have

ajTx+bjβˆ’f(x)<0.

Thus, the complimentary condition

Ξ»j(ajTx+bjβˆ’f(x))=0,j=1,…,m

denotes the fact that whenever ajTx+bjβˆ’f(x)<0, then Ξ»j must be zero and whenever ajTx+bjβˆ’f(x)=0 then Ξ»jβ‰₯0 is allowed (since Ξ»Ξ»βˆˆΞ”m).

If we put together a matrix A∈RmΓ—n whose rows are a1T,…,amT, then the optimality condition can be succinctly stated as

βˆƒΞ»Ξ»βˆˆΞ”m s.t. ATλλ=0 and Ξ»j(ajTx+bjβˆ’f(xβˆ—))=0,j=1,…,m.

8.16.17. Minimization ProblemsΒΆ

Minimization problem:

min{f(x)|g(x)≀0,x∈X}.

Dual function:

q(Ξ»)minx∈X{L(x,Ξ»)β‰œf(x)+Ξ»Tg(x)}

Assume that for Ξ»=Ξ»0 the minimization in R.H.S. is obtained at x=x0.

Subgradient of the (negative of the) dual function βˆ’q:

βˆ’g(x0)βˆˆβˆ‚(βˆ’q)(Ξ»0).