8.16. Subgradients¶

Primary references for this section are [6, 7].

Throughout this section, we assume that $V, W$ are finite dimensional real vector spaces. Wherever necessary, they are equipped with a norm $‖ \cdot ‖$ or an real inner product $⟨ \cdot, \cdot ⟩$ . They are also equipped with a metric $d (x, y) = ‖ x - y ‖$ as needed.

8.16.1. Subgradients¶

Definition 8.72 (Subgradient)

Let $f : V \to (- \infty, \infty]$ be a proper function. Let $x \in dom f$ . A vector $g \in V^{*}$ is called a subgradient of $f$ at $x$ if

(8.6)¶

f (y) \geq f (x) + ⟨ y - x, g ⟩ \forall y \in V .

This inequality is known as the subgradient inequality.

Here, we assume that $f$ is an extended value extension whenever required; i.e., $f (x) = \infty$ for all $x \notin dom f$ .

As discussed in Theorem 4.108, the vector spaces $V$ and $V^{*}$ are isomorphic. Therefore, we follow the convention that both $V$ and $V^{*}$ have exactly the same elements. The primary difference between $V$ and $V^{*}$ comes from the computation of norm. If $V$ is endowed with a norm $‖ \cdot ‖$ then $V^{*}$ is endowed with a dual norm $‖ \cdot ‖_{*}$ .

In the arguments below $B [a, r]$ or $B_{‖ \cdot ‖} [a, r]$ denotes the closed ball of radius $r$ in the normed space $(V, ‖ \cdot ‖)$ . The closed ball of radius $r$ in the dual space $(V, ‖ \cdot ‖_{*})$ shall be denoted by $B_{*} [a, r]$ or $B_{‖ \cdot ‖_{*}} [a, r]$ . Open balls shall be denoted similarly.

Observation 8.9 (Global affine underestimator)

If $g$ is a subgradient of $f$ at some $x \in dom f$ , then the affine function $a : V \to R$ given by:

a (y) = f (x) + ⟨ y - x, g ⟩ = ⟨ y, g ⟩ + f (x) - ⟨ x, g ⟩ \forall y \in V

is a global affine underestimator for $f$ . Note that the term $f (x) - ⟨ x, g ⟩$ is a constant since $x \in V$ and $g \in V^{*}$ are fixed. This comes from the subgradient inequality:

f (y) \geq a (y) \forall y \in V .

Observation 8.10 (Subgradient inequality alternative form)

For $y \notin dom f$ , $f (y) = \infty$ . Thus, the subgradient inequality is trivially satisfied for all $y \notin dom f$ . An alternate form of the inequality is:

(8.7)¶

f (y) \geq f (x) + ⟨ y - x, g ⟩ \forall y \in dom f .

8.16.1.1. Geometric Interpretation¶

Observation 8.11 (Subgradient and supporting hyperplane)

Let $f : V \to (- \infty, \infty]$ be a proper function. Then $g$ be a subgradient of $f$ at $x$ if and only if $epi f$ has a supporting hyperplane at $(x, f (x))$ with a normal $(- g, 1)$ .

Let $H$ be a supporting hyperplane of $epi f$ at $(x, f (x))$ with the normal $(- g, 1)$ .

Then

$H = {(y, t) | ⟨ y, - g ⟩ + t = ⟨ x, - g ⟩ + f (x)} .$
For any $(y, f (y)) \in epi f$ , we must have

$\begin{aligned} ⟨ y, - g ⟩ + f (y) \geq ⟨ x, - g ⟩ + f (x) \\ ⟺ & f (y) \geq f (x) + ⟨ y - x, g ⟩ . \end{aligned}$
Then $g$ is a subgradient of $f$ at $x$ .

Now let $g$ be a subgradient of $f$ at $x$ .

Let $(y, t) \in epi f$ .
Then we have

$t \geq f (y) \geq f (x) + ⟨ y - x, g ⟩ .$
Rearranging the terms, we have

$⟨ y, - g ⟩ + t \geq ⟨ x, - g ⟩ + f (x)$

for every $(y, t) \in epi f$ .
Then the hyperplane

$H = {(y, t) | ⟨ y, - g ⟩ + t = ⟨ x, - g ⟩ + f (x)}$

is indeed a supporting hyperplane for $epi f$ .
The normal vector for this hyperplane is $(- g, 1)$ and it passes through the point $(x, f (x))$ .

8.16.2. Subdifferential¶

At a point $x \in dom f$ , it is possible that there are more than one subgradients. It is thus natural to introduce the notion of the set of all subgradients of $f$ at a specific point $x \in dom f$ .

Definition 8.73 (Subdifferential set)

Let $f : V \to (- \infty, \infty]$ be a proper function. The set of all subgradients of $f$ at a point $x \in dom f$ is called the subdifferential of $f$ at $x$ and is denoted by $\partial f (x)$ .

\partial f (x) ≜ {g \in V^{*} | f (y) \geq f (x) + ⟨ y - x, g ⟩ \forall y \in V} .

For all $x \notin dom f$ , we define $\partial f (x) = \emptyset$ .

Theorem 8.210 (Subdifferential of norm at origin)

Let $f : V \to R$ by given by:

f (x) = ‖ x ‖

where $‖ \cdot ‖$ is the norm endowed on $V$ . Then, the subdifferential of $f$ at $x = 0$ is given by the dual norm unit ball:

\partial f (0) = B_{‖ \cdot ‖_{*}} [0, 1] = {g \in V^{*} | ‖ g ‖_{*} \leq 1} .

Proof. $g \in \partial f (0)$ if and only if

f (y) \geq f (0) + ⟨ y - 0, g ⟩ \forall y \in V .

This reduces to:

‖ y ‖ \geq ⟨ y, g ⟩ \forall y \in V .

Maximizing both sides of this inequality over the set ${y | ‖ y ‖ \leq 1}$ , we obtain:

‖ g ‖_{*} = sup_{‖ y ‖ \leq 1} {⟨ y, g ⟩} \leq sup_{‖ y ‖ \leq 1} ‖ y ‖ = 1.

Thus, $‖ g ‖_{*} \leq 1$ is a necessary condition.

We now show that $‖ g ‖_{*} \leq 1$ is sufficient too. By Generalized Cauchy Schwartz inequality:

⟨ y, g ⟩ \leq ‖ y ‖ ‖ g ‖_{*} \leq ‖ y ‖ \forall y \in V .

Thus, if $‖ g ‖_{*} \leq 1$ , then $g$ is a subgradient.

Thus, the vectors that satisfy the subgradient inequality are exactly the same as those in $B_{‖ \cdot ‖_{*}} [0, 1]$ .

The subdifferential of a function $f$ may be empty at specific points $x \in V$ .

Definition 8.74 (Subdifferentiability)

A proper function $f : V \to (- \infty, \infty]$ is called subdifferentiable at some $x \in dom f$ if $\partial f (x) \neq \emptyset$ .

Definition 8.75 (Domain of subdifferentiability)

The set of points at which a proper function $f : V \to (- \infty, \infty]$ is subdifferentiable, denoted by $dom (\partial f)$ , is defined as:

dom (\partial f) ≜ {x \in V | \partial f (x) \neq \emptyset} .

8.16.2.1. Closedness and Convexity¶

Theorem 8.211 (Closedness and convexity of the subdifferential set)

Let $f : V \to (- \infty, \infty]$ be a proper function. Then the set $\partial f (x)$ is closed and convex for any $x \in V$ .

Proof. Let $x \in V$ be fixed. For any $y \in V$ , define the set

H_{y} = {g \in V^{*} | ⟨ y - x, g ⟩ \leq f (y) - f (x)} .

Note that $H_{y}$ is a closed half space in $V^{*}$ .

It is easy to see that

\partial f (x) = ⋂_{y \in V} H_{y} .

\begin{aligned} g \in \partial f (x) \\ ⟺ f (y) \geq f (x) + ⟨ y - x, g ⟩ \forall y \in V \\ ⟺ f (y) - f (x) \geq ⟨ y - x, g ⟩ \forall y \in V \\ ⟺ ⟨ y - x, g ⟩ \leq f (y) - f (x) \forall y \in V \\ ⟺ g \in H_{y} \forall y \in V \\ ⟺ g \in ⋂_{y \in V} H_{y} . \end{aligned}

Thus, $\partial f (x)$ is an infinite intersection of closed and convex sets. Hence $\partial f (x)$ is closed and convex.

8.16.2.2. Subdifferentiability and Convex Domain¶

Theorem 8.212 (Subdifferentiability + Convex domain $⟹$ Convexity)

Let $f : V \to (- \infty, \infty]$ be a proper function. Assume that $dom f$ is convex. If $f$ is subdifferentiable at every $x \in dom f$ , then $f$ is convex.

In other words:

\forall x \in dom f, \partial f (x) \neq \emptyset ⟹ f is convex .

Proof. Let $x, y \in dom f$ . Let $t \in [0, 1]$ . Let $z = (1 - t) x + t y$ .

Since $dom f$ is convex, hence $z \in dom f$ .
By hypothesis, $f$ is subdifferentiable at $z$ .
Thus, there exists $g \in \partial f (z)$ .
By subgradient inequality (8.7)

$\begin{array}{r} f (y) \geq f (z) + ⟨ y - z, g ⟩ = f (z) + (1 - t) ⟨ y - x, g ⟩ \\ f (x) \geq f (z) + ⟨ x - z, g ⟩ = f (z) - t ⟨ y - x, g ⟩ . \end{array}$
Multiplying the first inequality by $t$ , second by $(1 - t)$ and adding, we get:

$t f (y) + (1 - t) f (x) \geq f (z) .$
Thus,

$f ((1 - t) x + t y) = f (z) \leq t f (y) + (1 - t) f (x)$

holds true for any $x, y \in dom f$ and any $t \in [0, 1]$ .
Thus, $f$ is convex.

A convex function need not be subdifferentiable at every point in its domain. The problem usually occurs at the boundary points of the domain if the domain is not open.

8.16.2.3. Positive Scaling¶

Theorem 8.213 (Multiplication by a positive scalar)

Let $f : V \to (- \infty, \infty]$ be a proper function. Let $x \in dom f$ . For any $α > 0$ ,

\partial (α f) (x) = α \partial f (x) .

Proof. Let $g \in \partial f (x)$ .

By subgradient inequality (8.7)

$f (y) \geq f (x) + ⟨ y - x, g ⟩ \forall y \in dom f .$
Multiplying by $α$ , we get:

$(α f) (y) \geq (α f) (x) + ⟨ y - x, α g ⟩ \forall y \in dom (α f) .$
Thus, $α g \in \partial (α f) (x)$ .
Thus, $α \partial f (x) \subseteq \partial (α f) (x)$ .
It is easy to see the same argument backwards to show that

$\partial (α f) (x) = α \partial f (x) .$

8.16.3. Proper Convex Functions¶

In this subsection, we discuss the properties of the subdifferential sets for proper convex functions.

A proper convex function may not be subdifferentiable at every point in its domain. However, it is indeed subdifferentiable at the interior points and relative interior points of its domain.

8.16.3.1. Nonemptiness and Boundedness at Interior Points¶

Theorem 8.214 (Nonemptiness and boundedness of the subdifferential at interior points)

Let $f : V \to (- \infty, \infty]$ be a proper convex function with $S = dom f$ . Let $a \in int S$ . Then, $\partial f (a)$ is nonempty and bounded.

In other words, for a proper convex function, the subdifferential at the interior points of its domain is nonempty and bounded. We have

int dom f \subseteq dom (\partial f) .

Proof. Outline of the proof

Identify a supporting hyperplane for the epigraph of $f$ at $(a, f (a)$ .
Make use of the local Lipschitz continuity of the convex function at its interior points.
Show that the normal to the supporting hyperplane leads to a subgradient at $a$ .
Show that the subgradients are bounded by using the local Lipschitz continuity inequality and the subgradient inequality.

Consider the direct sum vector space $V \oplus R$ .

$epi f \subseteq V \oplus R$ .
Since $f$ is convex, hence $epi f$ is convex.
For some $a \in int S$ , consider the point $(a, f (a)) \in V \oplus R$ .
Since $f$ is convex, hence $(a, f (a)) \in bd epi f$ .
By supporting hyperplane theorem, there exists a vector $(p, - α) \in V^{*} \oplus R$ such that

$⟨ x, p ⟩ - t α \leq ⟨ a, p ⟩ - f (a) α \forall (x, t) \in epi f .$
We shall show that $α > 0$ must hold true and $g = \frac{p}{α}$ is indeed a subgradient at $a$ .
We note that, $(a, f (a) + 1) \in epi f$ . Putting it in,

$\begin{aligned} ⟨ a, p ⟩ - (f (a) + 1) α \leq ⟨ a, p ⟩ - α f (a) \\ ⟺ - α \leq 0 \\ ⟺ α \geq 0. \end{aligned}$

Thus, $α \geq 0$ .
Recall from Theorem 8.174 that $f$ is locally Lipschitz continuous at $a \in int dom f$ .
Thus, there exists $r > 0$ and $L > 0$ such that $B [a, r] \subseteq S$ and

$| f (x) - f (a) | \leq L ‖ x - a ‖ \forall x \in B [a, r] .$
Since $B [a, r] \subseteq S$ , hence $(x, f (x)) \in epi f$ for every $x \in B [a, r]$ .
Plugging $t = f (x)$ in the supporting hyperplane inequality, we get

$⟨ x, p ⟩ - f (x) α \leq ⟨ a, p ⟩ - f (a) α \forall x \in B [a, r] .$
Rearranging the terms,

$⟨ x - a, p ⟩ \leq α (f (x) - f (a)) \forall x \in B [a, r] .$
Using the local Lipschitz property,

$⟨ x - a, p ⟩ \leq α L ‖ x - a ‖ \forall x \in B [a, r] .$
Recall that the dual norm for $p \in V^{*}$ is given by

$‖ p ‖_{*} = sup {| ⟨ x, p ⟩ | | x \in V, x ‖ \leq 1} .$
Let $p^{†} \in V$ with $‖ p^{†} ‖ = 1$ be the vector at which the supremum is attained.
Then, $‖ p ‖_{*} = ⟨ p^{†}, p ⟩$ (since $V$ is real).
Since $p^{†}$ is a unit vector, hence $a + r p^{†} \in B [a, r]$ .
Plugging $x = a + r p^{†}$ in the inequality above, we get

$r ⟨ p^{†}, p ⟩ \leq α L ‖ r p^{†} ‖ \forall x \in B [a, r] .$
Simplifying

$r ‖ p ‖_{*} \leq α L r \forall x \in B [a, r] .$
This means that $α > 0$ must be true.
1. If $α = 0$ , then this inequality would require $p = 0$ .
2. But $(p, - α)$ is a nonzero vector describing the supporting hyperplane.
Going back to the supporting hyperplane inequality and putting $t = f (x)$ , we have

$⟨ x, p ⟩ - f (x) α \leq ⟨ a, p ⟩ - f (a) α \forall x \in S .$
Rearranging the terms, we get

$α (f (x) - f (a)) \geq ⟨ x - a, p ⟩ \forall x \in S .$
Letting $g = \frac{1}{α} p$ and dividing on both sides by $α$ (which is positive), we obtain

$f (x) - f (a) \geq ⟨ x - a, g ⟩ \forall x \in S .$
Rearranging again

$f (x) \geq f (a) + ⟨ x - a, g ⟩ \forall x \in S$

which is the subgradient inequality.
Thus, $g \in \partial f (a)$ .
Thus, $\partial f (a)$ is nonempty.

We next show the boundedness of $\partial f (a)$ .

Let $g \in \partial f (a)$ .
Let $g^{†} \in V$ such that $‖ g^{†} ‖ = 1$ and

$‖ g ‖_{*} = ⟨ g^{†}, g ⟩ .$
Let $x = a + r g^{†}$ .
Applying the subgradient inequality on $x$ , we get:

$f (x) \geq f (a) + ⟨ r g^{†}, g ⟩ = f (a) + r ‖ g ‖_{*} .$
Thus,

$r ‖ g ‖_{*} \leq f (x) - f (a) \leq L ‖ x - a ‖ = L ‖ r g^{†} ‖ = L r .$
Thus, $‖ g ‖_{*} \leq L$ for every $g \in \partial f (a)$ .
Thus, $\partial f (a)$ is bounded.

If $f$ is a proper convex function, then the only points at which $f$ may not be subdifferentiable (i.e. the subdifferential set is empty) are the points at the frontier of $dom f$ (i.e., $dom f ∖ int dom f$ ). $f$ may be subdifferentiable on the frontier points too.

Corollary 8.19 (Subdifferentiability of real valued convex functions)

Let $f : V \to R$ be a convex function. Then, $f$ is subdifferentiable over $V$ .

dom (\partial f) = V .

Proof. We have $dom f = V$ .

Let $x \in V$ .
$V$ is open in $(V, ‖ \cdot ‖)$ .
Thus, $x \in int V = dom f$ .
By Theorem 8.214, $\partial f (x)$ is nonempty and bounded as $x \in int dom f$ .
Hence, $f$ is subdifferentiable at $x$ .
Since this is valid for every $x \in V$ , hence $dom (\partial f) = V$ .

8.16.3.2. Nonempty, Convex and Compact Subdifferentials¶

Theorem 8.215 (Nonempty, convex and compact subdifferentials for proper convex functions)

Let $f : V \to (- \infty, \infty]$ be a proper and convex function. Let $x \in int S$ . Then $\partial f (x)$ is nonempty, convex and compact.

Proof. Let $x \in int S$ .

By Theorem 8.211, $\partial f (x)$ is closed and convex.
By Theorem 8.214, $\partial f (x)$ is nonempty and bounded.
Since $\partial f (x)$ is closed and bounded, hence it must be compact since $V$ is a finite dimensional normed linear space.

We present an alternative proof based on the min common/max crossing framework developed in Min Common/Max Crossing Duality. This proof can be skipped in first reading.

Proof. This proof is based on the second min common/max crossing theorem Theorem 9.34.

We fix some $x \in int S$ . We construct the set $M$ as

M = {(u, t) | u \in V, f (x + u) \leq t} .

We first consider the min common problem.

Note that $(0, f (x)) \in M$ since $f (x + 0) \leq f (x)$ .
Further see that the min common value

$p^{*} = inf_{(0, p) \in M} p = f (x) .$
Hence $p^{*}$ is finite.
Note that $\overset{―}{M} = M$ where

$\overset{―}{M} = {(x, t) \in V \oplus R | there exists \bar{t} \in R with \bar{t} \leq t and (x, \bar{t}) \in M} .$
$M$ is convex.
1. Let $(u_{1}, t_{1}), (u_{2}, t_{2}) \in M$ and let $r \in (0, 1)$ .
2. Let $(u, t) = r (u_{1}, t_{1}) + (1 - r) (u_{2}, t_{2})$ .
3. We have $f (x + u_{1}) \leq t_{1}$ and $f (x + u_{2}) \leq t_{2}$ .
4. Now
  
  $\begin{aligned} f (x + u) & = f (x + r u_{1} + (1 - r) u_{2}) \\ = f (r (x + u_{1}) + (1 - r) (x + u_{2})) \\ \leq r f (x + u_{1}) + (1 - r) f (x + u_{2}) \\ \leq r t_{1} + (1 - r) t_{2} = t . \end{aligned}$
5. Hence $(u, t) \in M$ .
6. Hence $M$ is convex.

Next consider the max crossing problem and strong duality.

We have

$q^{*} = sup_{a \in V} q (a)$

where

$q (a) = inf_{(u, t) \in M} {⟨ u, a ⟩ + t} .$
The set of optimal solutions of the max crossing problem is given by

$Q^{*} = {a \in V | q (a) = q^{*}}$
For some $a \in Q^{*}$ , we can attain strong duality with

$p^{*} = q^{*} = q (a)$

if and only if

$f (x) = inf_{(u, t) \in M} {⟨ u, a ⟩ + t} .$
Equivalently

$f (x) \leq f (x + u) + ⟨ u, a ⟩ \forall u \in V .$
Equivalently

$f (x + u) \geq f (x) + ⟨ u, - a ⟩ \forall u \in V .$
But this is nothing but the subgradient inequality with $- a$ as the subgradient at $x$ .
In other words, strong duality is attained at $a$ as a solution of the max crossing problem if and only if $- a$ is a subgradient of $f$ .
Hence $Q^{*}$ with strong duality is given by $- \partial f (x)$ .

We next establish the conditions for the second min common/max crossing theorem Theorem 9.34

Consider the set

$D = {u \in V | there exists t \in R with (u, t) \in M} .$
It is easy to see that $D = S - x$ .
1. Consider the set $T = S - x$ .
2. Let $u \in T$ .
3. Then $x + u \in S$ .
4. Hence $f (x + u) \leq f (x + u)$ .
5. Hence $(u, f (x + u)) \in M$ .
6. Hence $u \in D$ .
7. Hence $T = S - x \subseteq D$ .
8. Let $u \notin T$ .
9. Then $u + x \notin S$ .
10. Hence $f (u + x) = \infty$ .
11. Hence for every $t \in R$ , $f (u + x) > t$ .
12. Hence $(u, t) \notin M$ for every $t \in R$ .
13. Hence $u \notin D$ .
14. Hence $D = T = S - x$ .
Since $x \in int S$ , hence $0 \in int D$ .
We see that all the conditions of the second min common/max crossing theorem are satisfied.
1. $p^{*}$ is finite.
2. $\overset{―}{M} = M$ is convex.
3. The set $D$ contains $0$ in its interior.
Hence $- \partial f (x)$ is nonempty, convex and compact.
Hence $\partial f (x)$ is also nonempty, convex and compact.

8.16.3.3. Subgradients over a Compact Set¶

Theorem 8.216 (Subgradients over a compact set are nonempty and bounded)

Let $f : V \to (- \infty, \infty]$ be a proper convex function with $S = dom f$ . Let $A \subseteq int S$ be a nonempty and compact subset of the interior of the domain of $f$ . Then, the set of subgradients over $A$ given by

Y = ⋃_{x \in A} \partial f (x)

is nonempty and bounded.

Proof. We are given that $A$ is nonempty and compact subset of interior of domain of $f$ .

For any $x \in A$ , by Theorem 8.214, $\partial f (x)$ is nonempty and bounded.
Thus, $Y$ is nonempty.

We next prove that $Y$ must be bounded also.

Let $T = V ∖ (int S)$ .
$T$ is closed and $A$ is closed. $A \subseteq int S$ . Hence, $A \cap T = \emptyset$ .
Since $A$ is compact and $T$ is closed and $A \cap T = \emptyset$ , hence distance between $A$ and $T$ is nonzero due to Theorem 3.128.

$r = d (A, T) > 0.$
Thus,

$‖ x - y ‖ \geq r \forall x \in A, y \notin int S .$
Let $s = \frac{r}{2}$ .
Let $D = B [0, s]$ . $D$ is a closed and bounded set. Hence, it is compact due to Theorem 4.68.
Let $E = A + D$ . Then $E \subseteq int S$ .
1. Let $y \in E$ .
2. Then, there is $x \in A$ and $v \in D$ such that $y = x + v$ .
3. Thus, $y - x = v$ .
4. Hence $‖ y - x ‖ \leq s < r$ .
Since both $A$ and $D$ are compact, hence $E$ is compact due to Theorem 4.69.
By Theorem 8.174, $f$ is local Lipschitz continuous at every $x \in E$ since $x \in E \subseteq int S$ .
Then, by Theorem 3.79, $f$ is Lipschitz continuous on $E$ .
Thus, there exists $L > 0$ such that

$| f (x) - f (y) | \leq L ‖ x - y ‖ \forall x, y \in E .$
Let $g \in Y$ . Then, there is $x \in A$ such that $g \in \partial f (x)$ .
we can choose $g^{⊥} \in V$ such that

$‖ g ‖_{*} = ⟨ g^{⊥}, g ⟩ and ‖ g^{⊥} ‖ = 1.$
Now, let $y = x + s g^{⊥}$ . Then, $y \in E$ .
1. $‖ y - x ‖ = ‖ s g^{⊥} ‖ = s$ .
2. Thus, $s g^{⊥} \in D$ .
3. Thus, $y \in E$ since $x \in A$ .
Also, $x \in E$ since $x = x + 0$ and $0 \in D$ .
Consequently, by Lipschitz continuity

$| f (y) - f (x) | \leq L ‖ y - x ‖ = L s .$
By subgradient inequality at $x$

$f (y) - f (x) \geq ⟨ y - x, g ⟩ = s ⟨ g^{⊥}, g ⟩ = s ‖ g ‖_{*} .$
Using the Lipschitz bound above, we get

$s ‖ g ‖_{*} \leq L s .$
Thus, $‖ g ‖_{*} \leq L$ .
Since $g$ was chosen arbitrarily, hence $Y$ is bounded.

We recall from Definition 8.56 that the relative interior of a convex set is given by

ri C = {x \in C | \exists r > 0, B (x, r) \cap aff C \subseteq C} .

It is interior of the convex set w.r.t. the subspace topology of its affine hull.

8.16.3.4. Nonempty Subdifferential at Relative Interior Points¶

Theorem 8.217 (Nonemptiness of the subdifferential at relative interior points)

Let $f : V \to (- \infty, \infty]$ be a proper convex function with $S = dom f$ . Let $x \in ri S$ . Then, $\partial f (x)$ is nonempty and has the form

\partial f (x) = L^{⊥} + G

where $L$ is the subspace that is parallel to $aff S$ and $G$ is a nonempty and compact set.

In other words, for a proper convex function, the subdifferential at the relative interior points of its domain is nonempty. We have

ri dom f \subseteq dom (\partial f) .

The proof is based on the min common/max crossing framework developed in Min Common/Max Crossing Duality. It can be skipped in first reading. It follows the proof of Theorem 8.215.

Proof. This proof is based on the second min common/max crossing theorem Theorem 9.34.

We fix some $x \in ri S$ . We construct the set $M$ as

M = {(u, t) | u \in V, f (x + u) \leq t} .

We have already established the following in the proof of Theorem 8.215:

$M$ is convex.
$\overset{―}{M} = M$ .
$p^{*} = f (x)$ .
$p^{*}$ is finite.
$q^{*} = sup_{a \in V} q (a)$ where

$q (a) = inf_{(u, t) \in M} {⟨ u, a ⟩ + t} .$
$Q^{*} = {a \in V | q (a) = q^{*}}$ .
When the strong duality holds, then

$Q^{*} = - \partial f (x) .$
The set

$D = {u \in V | there exists t \in R with (u, t) \in M} = S - x .$

Continuing further

Since $x \in ri S$ , hence $0 \in ri D$ .
Hence $aff D = aff S - x = L$ .
Hence by the second min common/max crossing theorem

$- \partial f (x) = Q^{*} = L^{⊥} + \tilde{Q}$

where $\tilde{Q}$ is a nonempty, convex and compact set.
Negating on both sides, we obtain

$\partial f (x) = L^{⊥} + G$

where $G$ is a nonempty, convex and compact set.

Corollary 8.20 (Existence of points with nonempty subdifferential)

Let $f : V \to (- \infty, \infty]$ be a proper convex function with $S = dom f$ . Then, there exists $x \in S$ such that $\partial f (x)$ is nonempty.

Proof. The effective domain of a proper convex function is convex and nonempty.

By Theorem 8.142, the relative interior of $S = dom f$ is nonempty.
Thus, there exists $x \in S$ .
By Theorem 8.217, $\partial f (x)$ is nonempty.

8.16.3.5. Unbounded Subdifferential¶

Theorem 8.218 (Unboundedness condition for subdifferential)

Let $f : V \to (- \infty, \infty]$ be a proper convex function with $S = dom f$ .

Assume that $\dim S < \dim V$ . Let $x \in S$ . If $\partial f (x) \neq \emptyset$ , then $\partial f (x)$ is unbounded.

Proof. We proceed as follows.

Let $n = \dim V$ .
Let $A = aff S$ . $A$ is an affine set.
We have $x \in S \subseteq A$ .
Then, $W = A - x$ is the subspace parallel to $A$ .
Accordingly, $m = \dim W < \dim V = n$ .
Then, the orthogonal complement of $W$ is a nontrivial subspace with dimension $n - m$ .
Let $v \in W^{⊥}$ be a nonzero vector.
Then, $⟨ w, v ⟩ = 0$ for every $w \in W$ .
Now let $g \in \partial f (x)$ be an arbitrary subgradient at $x$ .
By subgradient inequality

$f (y) \geq f (x) + ⟨ y - x, g ⟩ \forall y \in S .$
Note that both $x \in S$ and $y \in S$ .
Hence, $y - x \in W$ .
Thus, $⟨ y - x, v ⟩ = 0$ .
But then, for any $α \in R$ ,

$\begin{aligned} ⟨ y - x, (g + α v) ⟩ & = ⟨ y - x, g ⟩ + α ⟨ y - x, v ⟩ \\ = ⟨ y - x, g ⟩ . \end{aligned}$
Thus, if $g \in \partial f (x)$ , then $g + α v \in \partial f (x)$ for every $α \in R$ .
Thus, $\partial f (x)$ is unbounded.

8.16.4. Directional Derivatives¶

The directional derivative of a proper convex function is closely linked with its subdifferential. To see this, let $x \in int dom f$ , let $d \in V$ be a nonzero direction and $t > 0$ . Let $g \in \partial f (x)$ and consider the subgradient inequality

f (x + t d) \geq f (x) + ⟨ t d, g ⟩ .

Hence

\frac{f (x + t d) - f (x)}{t} \geq ⟨ d, g ⟩

We saw in Observation 8.8 that $\frac{f (x + t d) - f (x)}{t}$ is a nondecreasing quantity and

f^{'} (x; d) = inf_{t > 0} \frac{f (x + t d) - f (x)}{t} .

This establishes the basic relation

(8.8)¶

f^{'} (x; d) \geq ⟨ d, g ⟩

for every $g \in \partial f (x)$ . In fact a stronger result is available in the form of max formula.

8.16.4.1. Max Formula¶

The max formula is one of the key results in this section. It connects subgradients with directional derivatives.

Theorem 8.219 (Max formula)

Let $f : V \to (- \infty, \infty]$ be a proper convex function with $S = dom f$ . Then for any $x \in int S$ and $d \in V$ ,

f^{'} (x; d) = sup {⟨ d, g ⟩ | g \in \partial f (x)} .

In words, the directional derivative is the supremum of the inner product of the subgradients with the direction.

Proof. Let $x \in int S$ and $d \in V$ .

Let $t > 0$ . Then, by subgradient inequality

$f (x + t d) - f (x) \geq ⟨ t d, g ⟩ \forall g \in \partial f (x) .$
Thus,

$\frac{f (x + t d) - f (x)}{t} \geq ⟨ d, g ⟩ \forall g \in \partial f (x) .$
Taking the limit

$f^{'} (x; d) = lim_{t \to 0^{+}} \frac{f (x + t d) - f (x)}{t} \geq lim_{t \to 0^{+}} ⟨ d, g ⟩ = ⟨ d, g ⟩$

for every $g \in \partial f (x)$ .
Taking the supremum over $\partial f (x)$ on the R.H.S., we obtain

$f^{'} (x; d) \geq sup {⟨ d, g ⟩ | g \in \partial f (x)} .$

We now show that the inequality is indeed an equality.

Let $h : V \to R$ be given by

$h (v) = f^{'} (x; v) .$
By Theorem 8.206, $h$ is a real valued convex function and nonnegative homogeneous.
By Corollary 8.19, $h$ is subdifferentiable everywhere in $V$ .
In particular, $h$ is subdifferentiable at $d$ .
Let $g \in \partial h (d)$ .
For any $v \in V$ and $t \geq 0$ ,

$t f^{'} (x; v) = t h (v) = h (t v)$

since $h$ is nonnegative homogeneous.
By subdifferential inequality

$\begin{aligned} t f^{'} (x; v) & = h (t v) \\ \geq h (d) + ⟨ t v - d, g ⟩ \\ = f^{'} (x; d) + ⟨ t v - d, g ⟩ . \end{aligned}$
Rearranging the terms,

$t (f^{'} (x; v) - ⟨ v, g ⟩) \geq f^{'} (x; d) - ⟨ d, g ⟩ .$
Since this inequality is valid for every $t \geq 0$ , hence the term $f^{'} (x; v) - ⟨ v, g ⟩$ must be nonnegative. Otherwise, the inequality will be invalided for large enough $t$ . Thus,

$f^{'} (x; v) \geq ⟨ v, g ⟩ .$
By Theorem 8.207, for any $y \in S$ ,

$f (y) \geq f (x) + f^{'} (x; y - x) .$
From previous inequality,

$f^{'} (x; y - x) \geq ⟨ y - x, g ⟩ .$
Thus, for any $y \in S$ ,

$f (y) \geq f (x) + ⟨ y - x, g ⟩ .$
But this is a subgradient inequality. Hence, $g \in \partial f (x)$ .
Taking $t = 0$ , in the subgradient inequality for $h$ ,

$0 \geq f^{'} (x; d) - ⟨ d, g ⟩ .$
Thus, there exists $g \in \partial f (x)$ such that

$f^{'} (x; d) \leq ⟨ d, g ⟩ .$
Consequently,

$f^{'} (x; d) \leq sup {⟨ d, g ⟩ | g \in \partial f (x)} .$

Combining the two inequalities, we obtain the max formula:

f^{'} (x; d) = sup {⟨ d, g ⟩ | g \in \partial f (x)} .

Recall from Definition 8.51 that support function for a set $C$ is given by

σ_{C} (x) = sup {⟨ x, y ⟩ | y \in C} .

Corollary 8.21 (Max formula as a support function)

The max formula can be written as

f^{'} (x; d) = σ_{\partial f (x)} (d) .

8.16.5. Differentiability¶

8.16.5.1. Subdifferential and gradient¶

Theorem 8.220 (Subdifferential at points of differentiability)

Let $f : V \to (- \infty, \infty]$ be a proper convex function with $S = dom f$ . Let $x \in int S$ .

Then $f$ is differentiable at $x$ if and only if

\partial f (x) = {\nabla f (x)} .

In other words, if $f$ is differentiable at $x$ then its subdifferential is a singleton set consisting of the gradient and if the subdifferential at $x$ is a singleton, then $f$ is differentiable at $x$ .

Proof. Assume that $f$ is differentiable at $x$ .

Let $d \in V$ be some direction.
By Theorem 8.203,

$f^{'} (x; d) = ⟨ d, \nabla f (x) ⟩ .$
By Theorem 8.214, $f$ is subdifferentiable at $x$ since $f$ is convex and $x \in int S$ .
Let $g \in \partial f (x)$ .
By the max formula (Theorem 8.219):

$f^{'} (x; d) \geq ⟨ d, g ⟩$

as the directional derivative is the supremum of the inner product of the subgradients with the direction.
Thus,

$⟨ d, \nabla f (x) ⟩ \geq ⟨ d, g ⟩ .$
In turn,

$⟨ d, g - \nabla f (x) ⟩ \leq 0.$

This holds for every $d \in V$ .
By the definition of dual norm

$‖ g - \nabla f (x) ‖_{*} = sup_{‖ d ‖ \leq 1} {⟨ d, g - \nabla f (x) ⟩} .$
Using the previous inequality

$‖ g - \nabla f (x) ‖_{*} \leq 0.$
Since, dual norm is a norm, hence it cannot be negative. Thus,

$‖ g - \nabla f (x) ‖_{*} = 0$
Moreover, due to positive definiteness of a norm

$g - \nabla f (x) = 0$

must hold true.
Thus, $g = \nabla f (x)$ .
In other words, if $g$ is a subgradient to $f$ at $x$ , then it must equal $\nabla f (x)$ .
Thus, the only subgradient for $f$ at $x$ is $\nabla f (x)$ .
Thus,

$\partial f (x) = {\nabla f (x)} .$

For the converse, assume that $f$ is subdifferentiable at $x$ with $\partial f (x) = {g}$ .

By the subgradient inequality

$f (x + u) \geq f (x) + ⟨ u, g ⟩ \forall u \in V .$
Thus,

$f (x + u) - f (x) - ⟨ u, g ⟩ \geq 0 \forall u \in V .$
Define a function $h : V \to (- \infty, \infty]$ as

$h (u) ≜ f (x + u) - f (x) - ⟨ u, g ⟩ .$
We list some properties of $h$ .
1. By definition $h (u) \geq 0$ for every $u \in V$ .
2. $h$ is a convex function since $f (x + u)$ is convex, $⟨ u, g ⟩$ is linear and $f (x)$ is a constant (w.r.t. the variable $u$ ).
3. $dom h = dom f - x = S - x$ .
4. Thus, since $x \in int S$ , hence $0 = x - x \in int dom h$ .
5. $h (0) = f (x) - f (x) - ⟨ 0, g ⟩ = 0$ .
If we are able to show that

$lim_{u \to 0} \frac{h (u)}{‖ u ‖} = 0$

then, by the definition of gradient (Definition 8.71),

$g = \nabla f (x) .$
We can easily show that $\partial h (0) = {0}$ .
1. If $\tilde{g}$ is a subgradient of $h$ at $0$ , then by subgradient inequality
  
  $h (u) \geq h (0) + ⟨ u, \tilde{g} ⟩ = ⟨ u, \tilde{g} ⟩ \forall u \in V .$
2. Then, $\tilde{g} = 0$ satisfies this inequality since $h (u) \geq 0$ by definition.
3. For contradiction, assume a nonzero $\tilde{g}$ can satisfy this inequality.
4. Then,
  
  $\begin{aligned} h (u) \geq ⟨ u, \tilde{g} ⟩ \\ ⟺ f (x + u) - f (x) - ⟨ u, g ⟩ \geq ⟨ u, \tilde{g} ⟩ \\ ⟺ f (x + u) \geq f (x) + ⟨ u, \tilde{g} + g ⟩ \\ ⟺ \tilde{g} + g \in \partial f (x) . \end{aligned}$
5. This contradicts the hypothesis that the subgradient of $f$ at $x$ is ${g}$ .
6. Thus, $\partial h (0) = {0}$ .
Then, max formula (Theorem 8.219):

$h^{'} (0; d) = σ_{\partial h (0)} (d) = ⟨ d, 0 ⟩ = 0.$
Thus, from the definition of directional derivatives

$0 = h^{'} (0; d) = lim_{α \to 0^{+}} \frac{h (α d) - h (0)}{α} = lim_{α \to 0^{+}} \frac{h (α d)}{α} .$
Let us now introduce an orthonormal basis for $V$ as ${e_{1}, \dots, e_{n}}$ .
Assume that $V$ has been equipped with various $ℓ_{p}$ norms as described in Remark 8.1.
Since $0 \in int dom h$ , there exists $r \in (0, 1)$ such that

$B_{1} [0, r] \subseteq dom h .$
It is a cross polytope of radius $r$ with $2 n$ vertices given by ${\pm r e_{i}}_{i = 1}^{n}$ .

$B_{1} [0, r] = conv {\pm r e_{i}}_{i = 1}^{n} .$
Let us denote these $2 n$ vectors as $w_{1}, \dots, w_{2 n}$ .
By Remark 8.1

$B [0, s] = B_{2} [0, s] \subseteq B_{1} [0, r]$

where $s = \frac{r}{\sqrt{n}}$ .
Let $u \in B [0, s^{2}]$ be a nonzero vector.
Since $r < 1$ , hence $s < 1$ , hence $s^{2} < s$ .
Let $v = s \frac{u}{‖ u ‖}$ .
Then, $v \in B [0, s] \subseteq B_{1} [0, r]$ .
Thus, $v \in conv {w_{i}}_{i = 1}^{n}$ .
Thus, there exists $t \in Δ_{2 n}$ such that

$s \frac{u}{‖ u ‖} = v = \sum_{i = 1}^{2 n} t_{i} w_{i} .$
Then,

$\begin{aligned} \frac{h (u)}{‖ u ‖} & = \frac{h (\frac{‖ u ‖}{s} s \frac{u}{‖ u ‖})}{‖ u ‖} \\ = \frac{h (\sum_{i = 1}^{2 n} t_{i} \frac{‖ u ‖}{s} w_{i})}{‖ u ‖} & convex combination \\ \leq \sum_{i = 1}^{2 n} t_{i} \frac{h (‖ u ‖ \frac{w_{i}}{s})}{‖ u ‖} & h is convex and t \in Δ_{2 n} \\ \leq max_{i = 1, \dots, 2 n} {\frac{h (‖ u ‖ \frac{w_{i}}{s})}{‖ u ‖}} & since \sum t_{i} = 1. \end{aligned}$

Note that $‖ u ‖ \frac{w_{i}}{s} \in B [0, s] \subseteq B_{1} [0, r] \subseteq dom h$ .
Now,

$lim_{u \to 0} \frac{h (‖ u ‖ \frac{w_{i}}{s})}{‖ u ‖} = lim_{‖ u ‖ \to 0} \frac{h (‖ u ‖ \frac{w_{i}}{s})}{‖ u ‖} = lim_{α \to 0^{+}} \frac{h (α \frac{w_{i}}{s})}{α} = 0.$
Thus,

$lim_{u \to 0} \frac{h (u)}{‖ u ‖} = 0.$
Thus, $g = \nabla f (x)$ as desired.

8.16.6. Subdifferential Calculus¶

8.16.6.1. Function Sums¶

Theorem 8.221 (Subdifferential subadditivity with sum of functions)

Let $f_{1}, f_{2} : V \to (- \infty, \infty]$ be proper functions with $S_{1} = dom f_{1}$ and $S_{2} = dom f_{2}$ . For any $x \in S_{1} \cap S_{2}$

\partial f_{1} (x) + \partial f_{2} (x) \subseteq \partial (f_{1} + f_{2}) (x) .

Proof. Let $f = f_{1} + f_{2}$ . We note that $dom f = dom (f_{1} + f_{2}) = dom f_{1} \cap dom f_{2} = S_{1} \cap S_{2}$ .

Let $x \in S_{1} \cap S_{2}$ .
Let $g \in \partial f_{1} (x) + \partial f_{2} (x)$ .
Then, there exist $g_{1} \in \partial f_{1} (x)$ and $g_{2} \in \partial f_{2} (x)$ such that $g = g_{1} + g_{2}$ .
Then, by subgradient inequality, for any $y \in S_{1} \cap S_{2}$

$\begin{aligned} f_{1} (y) \geq f_{1} (x) + ⟨ y - x, g_{1} ⟩, \\ f_{2} (y) \geq f_{2} (x) + ⟨ y - x, g_{2} ⟩ . \end{aligned}$
Summing the two inequalities, we get

$f_{1} (y) + f_{2} (y) \geq f_{1} (x) + f_{2} (x) + ⟨ y - x, g_{1} + g_{2} ⟩ .$
Rewriting, for every $y \in dom f$

$(f_{1} + f_{2}) (y) \geq (f_{1} + f_{2}) (x) + ⟨ y - x, g ⟩ .$
Thus, $g = g_{1} + g_{2} \in \partial (f_{1} + f_{2}) (x)$ = \partial f (\bx).
Thus, $\partial f_{1} (x) + \partial f_{2} (x) \subseteq \partial f (x)$ .

We can generalize this result for a finite sum of functions using simple mathematical induction.

Corollary 8.22 (Weak sum rule of subdifferential calculus)

Let $f_{1}, \dots, f_{m} : V \to (- \infty, \infty]$ be proper functions. For any $x \in \cap_{i = 1}^{m} dom f_{i}$

\sum_{i = 1}^{m} \partial f_{i} (x) \subseteq \partial (\sum_{i = 1}^{m} f_{i}) (x) .

Theorem 8.222 (Subdifferential additivity with sum of convex functions)

Let $f_{1}, f_{2} : V \to (- \infty, \infty]$ be proper convex functions with $S_{1} = dom f_{1}$ and $S_{2} = dom f_{2}$ . For any $x \in int S_{1} \cap int S_{2}$

\partial (f_{1} + f_{2}) (x) = \partial f_{1} (x) + \partial f_{2} (x) .

Proof. With $f = f_{1} + f_{2}$ , by Theorem 3.22,

int dom f = int (S_{1} \cap S_{2}) = int S_{1} \cap int S_{2} .

Let $x \in int S_{1} \cap int S_{2}$ .
Thus, $x \in int dom f$ .
By max formula, for any $d \in V$ ,

$f^{'} (x; d) = sup {⟨ d, g ⟩ | g \in \partial f (x)} = σ_{\partial f (x)} (d) .$
Since directional derivative is additive, hence

$f^{'} (x; d) = f_{1}^{'} (x; d) + f_{2}^{'} (x; d) .$
Expanding on this

$\begin{aligned} σ_{\partial f (x)} (d) & = f^{'} (x; d) \\ = f_{1}^{'} (x; d) + f_{2}^{'} (x; d) \\ = sup {⟨ d, g_{1} ⟩ | g_{1} \in \partial f_{1} (x)} + sup {⟨ d, g_{1} ⟩ | g_{2} \in \partial f_{2} (x)} \\ = sup {⟨ d, g_{1} + g_{2} ⟩ | g_{1} \in \partial f_{1} (x), g_{2} \in \partial f_{2} (x)} \\ = sup {⟨ d, g ⟩ | g \in \partial f_{1} (x) + \partial f_{2} (x)} \\ = σ_{\partial f_{1} (x) + \partial f_{2} (x)} (d) . \end{aligned}$
In summary, for every $d \in V$ ,

$σ_{\partial f (x)} (d) = σ_{\partial f_{1} (x) + \partial f_{2} (x)} (d) .$
By Theorem 8.211, $\partial f (x)$ , $\partial f_{1} (x)$ and $\partial f_{2} (x)$ are closed and convex.
By Theorem 8.214, $\partial f (x)$ , $\partial f_{1} (x)$ and $\partial f_{2} (x)$ are nonempty and bounded.
Since $\partial f_{1} (x)$ and $\partial f_{2} (x)$ are closed and bounded, hence they are compact ( $V$ is finite dimensional).
Thus, $\partial f_{1} (x) + \partial f_{2} (x)$ is also closed, bounded, convex and nonempty.
Thus, both $\partial f (x)$ and $\partial f_{1} (x) + \partial f_{2} (x)$ are nonempty, convex and closed.
Then, due to Theorem 8.90,

$\partial f (x) = \partial f_{1} (x) + \partial f_{2} (x) .$

We can generalize this result for a finite sum of proper convex functions using simple mathematical induction.

Corollary 8.23 (Sum rule of subdifferential calculus for proper convex functions at interior points)

Let $f_{1}, \dots, f_{m} : V \to (- \infty, \infty]$ be proper convex functions. For any $x \in \cap_{i = 1}^{m} int dom f_{i}$

\sum_{i = 1}^{m} \partial f_{i} (x) = \partial (\sum_{i = 1}^{m} f_{i}) (x) .

For real valued convex functions, the domain is the entire $V$ and interior of $V$ is $V$ itself.

Corollary 8.24 (Sum rule of subdifferential calculus for real valued convex functions)

Let $f_{1}, \dots, f_{m} : V \to R$ be real valued convex functions. For any $x \in V$

\sum_{i = 1}^{m} \partial f_{i} (x) = \partial (\sum_{i = 1}^{m} f_{i}) (x) .

A more powerful result with less restrictive assumptions than Corollary 8.23 is possible if the intersection of the relative interiors of the domains of the individual functions is nonempty.

Theorem 8.223 (Sum rule of subdifferential calculus for proper convex functions)

Let $f_{1}, \dots, f_{m} : V \to (- \infty, \infty]$ be proper convex functions. Assume that $⋂_{i = 1}^{m} ri dom f_{i} \neq \emptyset$ . Then for any $x \in V$

\sum_{i = 1}^{m} \partial f_{i} (x) = \partial (\sum_{i = 1}^{m} f_{i}) (x) .

8.16.6.2. Linear Transformations¶

Our interest here is in compositions of the form $h = f \circ A$ where $A$ is a linear transformation. In other words $h (x) = f (A (x))$ .

If $A : V \to W$ is a linear transformation then $A^{T} : W^{*} \to V^{*}$ is a mapping from $W^{*}$ to $V^{*}$ and satisfies the relationship:

⟨ A (x), y ⟩ = ⟨ x, A^{T} (y) ⟩ .

From the definition of directional derivative, we have

\begin{aligned} h^{'} (x; d) & = lim_{t ↓ 0} \frac{h (x + t d) - h (x)}{t} \\ = lim_{t ↓ 0} \frac{f (A (x + t d)) - f (A (x))}{t} \\ = lim_{t ↓ 0} \frac{f (A (x) + t A (d)) - f (A (x))}{t} \\ = f^{'} (A (x); A (d)) . \end{aligned}

Theorem 8.224 (Weak linear transformation rule of subdifferential calculus)

Let $f : W \to (- \infty, \infty]$ be a proper function. Let $A : V \to W$ be a linear transformation. Define $h : V \to (- \infty, \infty]$ as

h (x) = f (A (x)) .

Assume that $h$ is proper, i.e. $dom h$ is not empty:

dom h = {x \in V | A (x) \in dom f} \neq \emptyset .

Then, for any $x \in dom h$

A^{T} (\partial f (A (x))) \subseteq \partial h (x) .

Proof. We proceed as follows.

Let $x \in dom h$ .
Let $g \in \partial f (A (x))$ .
By Theorem 8.219,

$⟨ z, g ⟩ \leq f^{'} (A (x); z) \forall z \in W .$
Choosing $z = A (d)$ , we have

$⟨ A (d), g ⟩ \leq f^{'} (A (x); A (d)) \forall d \in V .$
Equivalently

$⟨ d, A^{T} (g) ⟩ \leq h^{'} (x; d) \forall d \in V .$
Hence $A^{T} (g) \in \partial h (x)$ due to (8.8).
Hence $A^{T} (\partial f (A (x))) \subseteq \partial h (x)$ .

Theorem 8.225 (Strong linear transformation rule for subdifferential calculus)

Let $f : W \to (- \infty, \infty]$ be a proper convex function. Let $A : V \to W$ be a linear transformation. Define $h : V \to (- \infty, \infty]$ as

h (x) = f (A (x)) .

Assume that $h$ is proper, i.e. $dom h$ is not empty:

dom h = {x \in V | A (x) \in dom f} \neq \emptyset .

Then, for any $x \in int dom h$ such that $A (x) \in int dom f$ , we have:

A^{T} (\partial f (A (x))) = \partial h (x) .

Proof. We showed $A^{T} (\partial f (A (x))) \subseteq \partial h (x)$ in Theorem 8.224. We show the reverse inclusion by contradiction.

Let $x \in int dom h$ such that $A (x) \in int dom f$ .
Assume that there exists $d \in \partial h (x)$ such that $d \notin A^{T} (\partial f (A (x)))$ .
By Theorem 8.215, the set $\partial f (A (x))$ is nonempty, convex and compact.
Hence $A^{T} (\partial f (A (x)))$ is also nonempty, convex and compact.
By strict separation theorem (Theorem 8.169), there exists a vector $p$ and a scalar $c$ such that

$⟨ A^{T} (g), p ⟩ < c < ⟨ d, p ⟩ \forall g \in \partial f (A (x)) .$
Equivalently

$⟨ g, A (p) ⟩ < c < ⟨ d, p ⟩ \forall g \in \partial f (A (x)) .$
Taking the supremum over $\partial f (A (x))$ on the L.H.S., we obtain

$sup_{g \in \partial f (A (x))} ⟨ g, A (p) ⟩ < ⟨ d, p ⟩ .$
By the max formula

$f^{'} (A (x); A (p)) < ⟨ d, p ⟩ .$
But this means that

$h^{'} (x; p) < ⟨ d, p ⟩ .$
This contradicts the assumption that $d \in \partial h (x)$ .
Hence we must have

$A^{T} (\partial f (A (x))) = \partial h (x) .$

8.16.6.3. Affine Transformations¶

Theorem 8.226 (Weak affine transformation rule of subdifferential calculus)

Let $f : W \to (- \infty, \infty]$ be a proper function. Let $A : V \to W$ be a linear transformation. Let $b \in W$ . Define $h : V \to (- \infty, \infty]$ as

h (x) = f (A (x) + b) .

Assume that $h$ is proper, i.e. $dom h$ is not empty:

dom h = {x \in V | A (x) + b \in dom f} \neq \emptyset .

Then, for any $x \in dom h$

A^{T} (\partial f (A (x) + b)) \subseteq \partial h (x) .

Proof. We proceed as follows.

Let $x \in dom h$ .
Then, $x^{'} = A (x) + b \in dom f$ such that $h (x) = f (x^{'})$ .
Let $g \in A^{T} (\partial f (x^{'}))$ .
Then, there is $d \in W^{*}$ such that $g = A^{T} (d)$ with $d \in \partial f (x^{'})$ .
Let $y \in dom h$ .
Then, $y^{'} = A (y) + b \in dom f$ such that $h (y) = f (y^{'})$ .
Applying subgradient inequality for $f$ at $x^{'}$ with the subgradient being $d$ , we get

$f (y^{'}) \geq f (x^{'}) + ⟨ y^{'} - x^{'}, d ⟩ .$
We have $h (y) = f (y^{'})$ , $h (x) = f (x^{'})$ and $y^{'} - x^{'} = A (y - x)$ .
Thus, the subgradient inequality simplifies to

$h (y) \geq h (x) + ⟨ A (y - x), d ⟩ .$
We note that

$⟨ A (y - x), d ⟩ = ⟨ y - x, A^{T} (d) ⟩ .$
Thus, for any $y \in dom h$ , we have

$h (y) \geq h (x) + ⟨ y - x, A^{T} (d) ⟩ .$
Thus, $g = A^{T} (d) \in \partial h (x)$ .

Since this is valid for any $x \in dom h$ and for every $g \in A^{T} (\partial f (A (x) + b))$ , hence

A^{T} (\partial f (A (x) + b)) \subseteq h (x) .

Theorem 8.227 (Affine transformation rule of subdifferential calculus)

Let $f : W \to (- \infty, \infty]$ be a proper convex function. Let $A : V \to W$ be a linear transformation. Let $b \in W$ . Define $h : V \to (- \infty, \infty]$ as

h (x) = f (A (x) + b) .

Assume that $h$ is proper, i.e. $dom h$ is not empty:

dom h = {x \in V | A (x) + b \in dom f} \neq \emptyset .

Then, for any $x \in int dom h$ such that $A (x) + b \in int dom f$ , we have:

A^{T} (\partial f (A (x) + b)) = \partial h (x) .

Proof. We note that $h$ is a proper convex function since it is a composition of an affine transformation with a proper convex function.

Let $x \in int dom h$ such that $x^{'} = A (x) + b \in int dom f$ .
Then, for any direction $d \in V$ , by the max formula,

$h^{'} (x; d) = σ_{\partial h (x)} (d) .$
By the definition of the directional derivative, we have

$\begin{aligned} h^{'} (x; d) & = lim_{α \to 0^{+}} \frac{h (x + α d) - h (x)}{α} \\ = lim_{α \to 0^{+}} \frac{f (A (x) + b + α A (d)) - f (A (x) + b}{α} \\ = f^{'} (A (x) + b; A (d)) . \end{aligned}$
Thus,

$σ_{\partial h (x)} (d) = f^{'} (A (x) + b; A (d)) .$
Using the max formula on the R.H.S., we get

$\begin{aligned} σ_{\partial h (x)} (d) & = f^{'} (A (x) + b; A (d)) \\ = sup_{g \in \partial f (A (x) + b)} ⟨ A (d), g ⟩ \\ = sup_{g \in \partial f (A (x) + b)} ⟨ d, A^{T} (g) ⟩ \\ = sup_{g^{'} \in A^{T} (\partial f (A (x) + b))} ⟨ d, g^{'} ⟩ \\ = σ_{A^{T} (\partial f (A (x) + b))} (d) . \end{aligned}$
Since $x \in int dom h$ , hence by Theorem 8.211 and Theorem 8.214 $\partial h (x)$ is nonempty, closed and convex.
Since $A (x) + b \in int dom f$ , hence by Theorem 8.211 and Theorem 8.214 $\partial f (A (x) + b)$ is nonempty, closed and convex.
It follows that $A^{T} (\partial f (A (x) + b))$ is nonempty, closed and convex since $A^{T}$ is a linear operator and both $V$ and $W$ are finite dimensional.
Then, due to Theorem 8.90,

$A^{T} (\partial f (A (x) + b)) = \partial h (x) .$

8.16.6.4. Composition¶

Chain rule is a key principle in computing derivatives of composition of functions. A chain rule is available for subgradient calculus also.

We first recall a result on the derivative of composition of real functions.

Theorem 8.228 (Chain rule for real functions)

Let $f : R \to R$ be a real function which is continuous on $[a, b]$ with $a < b$ . Assume that $f_{+}^{'} (a)$ exists. Let $g : R \to R$ be another real function defined on an open interval $I$ such that $range f \subseteq I$ . Assume $g$ is differentiable at $f (a)$ . Then the composite real function $h : R \to R$ given by

h (t) ≜ g (f (t)) (a \leq t \leq b)

is right differentiable at $t = a$ . In particular,

h_{+}^{'} (a) = g^{'} (f (a)) f_{+}^{'} (a) .

Proof. We show this by working with the definition of right hand derivative as a limit

\begin{aligned} h_{+}^{'} (a) & = lim_{t \to a^{+}} \frac{h (t) - h (a)}{t - a} \\ = lim_{t \to a^{+}} \frac{g (f (t)) - g (f (a))}{t - a} \\ = lim_{t \to a^{+}} \frac{g (f (t)) - g (f (a))}{f (t) - f (a)} \frac{f (t) - f (a)}{t - a} \\ = lim_{z \to f (a)} \frac{g (z) - g (f (a))}{z - f (a)} lim_{t \to a^{+}} \frac{f (t) - f (a)}{t - a} \\ = g^{'} (f (a)) f_{+}^{'} (a) . \end{aligned}

We can now develop a chain rule for subdifferentials of multidimensional functions with the help of max formula.

Theorem 8.229 (Subdifferential chain rule)

Let $f : V \to R$ be convex and let $g : R \to R$ be a nondecreasing convex function. Let $x \in V$ . Assume that $g$ is differentiable at $f (x)$ . Let $h = g \circ f$ . Then

\partial h (x) = g^{'} (f (x)) \partial f (x) .

Proof. We are given $x \in V$ at which $g$ is differentiable and $f$ is convex.

Since $f$ is convex and $g$ is nondecreasing convex function, hence $h$ is also convex.
We now introduce two real functions parametrized on $x$ and an arbitrary direction $d \in V$

$\begin{aligned} f_{x, d} (t) = f (x + t d), t \in R \\ h_{x, d} (t) = h (x + t d), t \in R \end{aligned}$
It is now easy to see that

$h_{x, d} (t) = h (x + t d) = g (f (x + t d)) = g (f_{x, d} (t)) .$
Thus, $h_{x, d} = g \circ f_{x, d}$ .
Since $f_{x, d}$ and $h_{x, d}$ are restrictions of $f$ and $h$ along a line, they are also convex.
Due to Theorem 8.204, the directional derivatives of $f$ and $h$ exist in every direction.
By the definition of directional derivative Definition 8.70,

$\begin{aligned} (f_{x, d})_{+}^{'} (0) = f^{'} (x; d), \\ (h_{x, d})_{+}^{'} (0) = h^{'} (x; d) . \end{aligned}$
Also note that $f_{x, d} (0) = f (x)$ and $h_{x, d} (0) = h (x)$ .
$f_{x, d}$ is right differentiable at $t = 0$ , and $g$ is differentiable at $f (x)$ .
Hence, by the chain rule in Theorem 8.228,

$h^{'} (x; d) = g^{'} (f (x)) f^{'} (x; d) .$
By the max formula in Corollary 8.21,

$\begin{aligned} h^{'} (x; d) = σ_{\partial h (x)} (d) \\ f^{'} (x; d) = σ_{\partial f (x)} (d) . \end{aligned}$
Thus,

$\begin{aligned} σ_{\partial h (x)} (d) & = h^{'} (x; d) \\ = g^{'} (f (x)) f^{'} (x; d) \\ = g^{'} (f (x)) σ_{\partial f (x)} (d) \\ = σ_{g^{'} (f (x)) \partial f (x)} (d) . \end{aligned}$

The last step is due to Theorem 8.92. Since $g$ is nondecreasing, hence $g^{'} (f (x)) \geq 0$ .
By Theorem 8.211 and Theorem 8.214, the sets $\partial f (x)$ and $\partial h (x)$ are nonempty, closed and convex.
Then, the set $g^{'} (f (x)) \partial f (x)$ is also nonempty, closed and convex.
Thus, by Theorem 8.90,

$\partial h (x) = g^{'} (f (x)) \partial f (x) .$

Applications of this rule are presented later in Example 8.70.

8.16.6.5. Max Rule¶

Theorem 8.230 (Max rule of subdifferential calculus)

Let $f_{1}, f_{2}, \dots, f_{m} : V \to (- \infty, \infty]$ be a set of proper convex functions. Let $f : V \to (- \infty, \infty]$ be given by:

f (x) = max {f_{1} (x), f_{2} (x), \dots, f_{m} (x)} .

Let $x \in ⋂_{i = 1}^{m} int dom f_{i}$ be a point common to the interiors of domains of all the functions.

The subdifferential set of $f$ at $x$ can be obtained from the subdifferentials of $f_{i}$ as follows:

\partial f (x) = conv (⋃_{i \in I (x)} \partial f_{i} (x))

where $I (x) = {i | f_{i} (x) = f (x)}$ .

Proof. Since $f_{i}$ are proper convex, hence their pointwise maximum $f$ is proper convex.

Let $I (x) = {i \in 1, \dots, m | f_{i} (x) = f (x)}$ .
For any (nonzero) direction, $d \in V$ , by Theorem 8.209:

$f^{'} (x; d) = max_{i \in I (x)} f_{i}^{'} (x; d) .$
Without loss of generality, let us assume that $I (x) = 1, \dots, k$ for some $k \in 1, \dots, m$ . This can be achieved by reordering $f_{i}$ .
By max formula (Theorem 8.219),

$f_{i}^{'} (x; d) = sup {⟨ d, g ⟩ | g \in \partial f_{i} (x)} .$
Thus,

$f^{'} (x; d) = max_{i \in 1, \dots, k} sup_{g_{i} \in \partial f_{i} (x)} ⟨ d, g_{i} ⟩ .$
Recall that for any $a_{1}, \dots, a_{k} \in R$ , the identity below holds:

$max {a_{1}, \dots, a_{k}} = sup_{t \in Δ_{k}} \sum_{i = 1}^{k} t_{i} a_{i} .$
Thus, we can expand $f^{'} (x; d)$ as:

$\begin{aligned} f^{'} (x; d) & = max_{i \in 1, \dots, k} sup_{g_{i} \in \partial f_{i} (x)} ⟨ d, g_{i} ⟩ \\ = sup_{t \in Δ_{k}} {\sum_{i = 1}^{k} t_{i} sup_{g_{i} \in \partial f_{i} (x)} ⟨ d, g_{i} ⟩} \\ = sup_{t \in Δ_{k}} {\sum_{i = 1}^{k} sup ⟨ d, t_{i} g_{i} ⟩ | g_{i} \in \partial f_{i} (x)} \\ = sup {⟨ d, \sum_{i = 1}^{k} t_{i} g_{i} ⟩ | g_{i} \in \partial f_{i} (x), t \in Δ_{k}} \\ = sup {⟨ d, g ⟩ | g \in conv (⋃_{i = 1}^{k} \partial f_{i} (x))} \\ = σ_{A} (d) \end{aligned}$

where $A = conv (⋃_{i = 1}^{k} \partial f_{i} (x))$ and $σ$ denotes the support function.
Since $x \in int dom f$ , hence, by the max formula (Corollary 8.21)

$f^{'} (x; d) = σ_{\partial f (x)} (d) .$
Thus, we have

$σ_{\partial f (x)} (d) = σ_{A} d .$
By Theorem 8.211, $\partial f (x)$ is closed and convex.
By Theorem 8.214, $\partial f (x)$ is nonempty and bounded.
Thus, $\partial f (x)$ is nonempty, closed and convex.
Similarly, $\partial f_{i} (x)$ are nonempty, closed, convex and bounded.
Thus, $⋃_{i = 1}^{k} \partial f_{i} (x)$ is a finite union of nonempty, closed, convex and bounded sets.
Thus, $⋃_{i = 1}^{k} \partial f_{i} (x)$ is also nonempty and compact.
1. A finite union of nonempty sets is nonempty.
2. A finite union of bounded sets is bounded.
3. A finite union of closed sets is closed.
4. Thus, $⋃_{i = 1}^{k} \partial f_{i} (x)$ is closed and bounded.
5. Since $V$ is finite dimensional, hence closed and bounded sets are compact.
Since $A$ is a convex hull of $⋃_{i = 1}^{k} \partial f_{i} (x)$ , hence $A$ is nonempty, closed and convex.
1. Recall from Theorem 8.129 that convex hull of a compact set is compact.
2. Also, recall that compact sets are closed and bounded.
Since $σ_{\partial f (x)} (d) = σ_{A} (d)$ is true for any $d \in V$ , the support functions for the underlying nonempty, closed and convex set are equal. Hence by Theorem 8.90,

$\partial f (x) = A = conv (⋃_{i = 1}^{k} \partial f_{i} (x)) .$

Some applications of this rule are presented later in Example 8.75, Example 8.76, Example 8.78, Example 8.79.

We now present a weaker version of the max rule which is applicable for pointwise supremum over an arbitrary set of functions.

Theorem 8.231 (Weak max rule of subdifferential calculus)

Let $I$ be an arbitrary index set and suppose that for every $i \in I$ , there exists a proper convex function $f_{i} : V \to (- \infty, \infty]$ . Let $f : V \to (- \infty, \infty]$ be given by:

f (x) = sup_{i \in I} {f_{i} (x)} .

Then for any $x \in dom f$ ,

conv (⋃_{i \in I (x)} \partial f_{i} (x)) \subseteq \partial f (x)

where $I (x) = {i \in I | f_{i} (x) = f (x)}$ .

In words, if $f_{i} (x) = f (x)$ , then a subgradient of $f_{i}$ at $x$ is also a subgradient of $f$ at $x$ . Also, for all $i \in I$ such that $f_{i} (x) = f (x)$ , any convex combination of their subgradients at $x$ is also a subgradient of $f$ at $x$ .

Proof. Pick some $x \in dom f$ .

Let $z \in dom f$ be arbitrary.
Let $I (x) = {i \in I | f_{i} (x) = f (x)}$ .
Let $i \in I (x)$ be arbitrary.
Let $g \in \partial f_{i} (x)$ be a subgradient of $f_{i}$ at $x$ .
Then, by definition of $f$ and subgradient inequality:

$f (z) \geq f_{i} (z) \geq f_{i} (x) + ⟨ z - x, g ⟩ = f (x) + ⟨ z - x, g ⟩ .$

We used the fact that $f_{i} (x) = f (x)$ for $i \in I (x)$ .
Thus, $g \in \partial f (x)$ . $g$ is a subgradient of $f$ at $x$ .
Since this is valid for every subgradient of $f_{i}$ at $x$ , hence $\partial f_{i} (x) \subseteq \partial f (x)$ .
Since this is valid for every $i \in I (x)$ , hence

$⋃_{i \in I (x)} \partial f_{i} (x) \subseteq \partial f (x) .$
Recall from Theorem 8.211 that $\partial f (x)$ is convex.
Thus, it contains the convex hull of any of its subsets. Hence,

$conv (⋃_{i \in I (x)} \partial f_{i} (x)) \subseteq \partial f (x) .$

Next is an example application of the weak max rule.

Example 8.66 (Subgradient of $λ_{max} (A_{0} + \sum_{i = 1}^{m} x_{i} A_{i}$ )

Let $A_{0}, A_{1}, \dots, A_{m} \in S^{n}$ be $m + 1$ given symmetric matrices. Define an affine transformation $A : R^{m} \to S^{n}$ as

A (x) ≜ A_{0} + \sum_{i = 1}^{m} x_{i} A_{i} .

For every vector $x \in R^{m}$ , this mapping defines a symmetric matrix $A (x)$ . We can compute the largest eigen value of $A (x)$ . We introduce a function $f : R^{m} \to R$ as

f (x) ≜ λ_{max} (A (x)) = λ_{max} (A_{0} + \sum_{i = 1}^{m} x_{i} A_{i}) .

Our task is to find a subgradient of $f$ at $x$ .

Recall from the definition of largest eigen values,

$f (x) = sup_{y \in R^{n}; ‖ y ‖_{2} = 1} y^{T} A (x) y .$
For every $y \in R^{n}$ such that $‖ y ‖_{2} = 1$ , we can define a function:

$f_{y} (x) ≜ y^{T} A (x) y .$
Then,

$f (x) = sup_{y \in R^{n}; ‖ y ‖_{2} = 1} f_{y} (x) .$
The function $f_{y} (x)$ is affine (in $x$ ) for every $y$ .
Thus, $f_{y}$ is convex for every $y$ .
Thus, $f$ is a pointwise supremum of a family of functions $f_{y}$ .
Thus, $f$ is also convex (see Theorem 8.114).
Consequently, we can use the weak max rule Theorem 8.231 to identify a subgradient of $f$ at $x$ .
Let $\tilde{y}$ be a normalized eigenvector of $A (x)$ corresponding to its largest eigenvalue. Then

$f (x) = {\tilde{y}}^{T} A (x) \tilde{y} .$
This means that $f (x) = f_{\tilde{y}} (x)$ .
By the weak max rule, a subgradient of $f_{\tilde{y}}$ at $x$ is also a subgradient of $f$ at $x$ .
Expanding $f_{\tilde{y}} (x)$ :

$f_{\tilde{y}} (x) = {\tilde{y}}^{T} A (x) \tilde{y} = {\tilde{y}}^{T} A_{0} \tilde{y} + \sum_{i = 1}^{m} {\tilde{y}}^{T} A_{i} \tilde{y} x_{i} .$
Then, the gradient of $f_{\tilde{y}}$ at $x$ (computed by taking partial derivatives w.r.t. $x_{i}$ ) is

$\nabla f_{\tilde{y}} (x) = ({\tilde{y}}^{T} A_{1} \tilde{y}, \dots, {\tilde{y}}^{T} A_{m} \tilde{y}) .$
Since $f_{y}$ is affine (thus convex), hence its gradient is also a subgradient.
Thus,

$({\tilde{y}}^{T} A_{1} \tilde{y}, \dots, {\tilde{y}}^{T} A_{m} \tilde{y}) \in \partial f (x) .$

8.16.7. Lipschitz Continuity¶

Theorem 8.232 (Lipschitz continuity and boundedness of the subdifferential sets)

Let $f : V \to (- \infty, \infty]$ be a proper convex function. Suppose that $X \subseteq int dom f$ . Consider the following two claims:

$| f (x) - f (y) | \leq L ‖ x - y ‖$ for any $x, y \in X$ .
$‖ g ‖_{*} \leq L$ for any $g \in \partial f (x)$ where $x \in X$ .

Then,

(2) implies (1). In other words, if subgradients are bounded then, the function is Lipschitz continuous.
If $X$ is open, then (1) holds if and only if (2) holds.

In other words, if the subgradients over a set $X$ are bounded then $f$ is Lipschitz continuous over $X$ . If $X$ is open then $f$ is Lipschitz continuous over $X$ if and only if the subgradients over $X$ are bounded.

Proof. (a) We first show that $(2) ⟹ (1)$ .

Assume that (2) is satisfied.
Pick any $x, y \in X$ .
Since $f$ is proper and convex and $x, y \in int dom f$ , hence due to Theorem 8.214, $\partial f (x)$ and $\partial f (y)$ are nonempty.
Let $g_{x} \in \partial f (x)$ and $g_{y} \in \partial f (y)$ .
By subgradient inequality

$\begin{aligned} f (y) \geq f (x) + ⟨ y - x, g_{x} ⟩; \\ f (x) \geq f (y) + ⟨ x - y, g_{y} ⟩ . \end{aligned}$
We can rewrite this as

$\begin{aligned} f (x) - f (y) \leq ⟨ x - y, g_{x} ⟩; \\ f (y) - f (x) \leq ⟨ y - x, g_{y} ⟩ . \end{aligned}$
By generalized Cauchy Schwartz inequality (Theorem 4.110),

$\begin{array}{r} ⟨ x - y, g_{x} ⟩ \leq ‖ x - y ‖ ‖ g_{x} ‖_{*} \leq L ‖ x - y ‖; \\ ⟨ y - x, g_{y} ⟩ \leq ‖ y - x ‖ ‖ g_{y} ‖_{*} \leq L ‖ x - y ‖ . \end{array}$
Combining the two inequalities, we get

$| f (x) - f (y) | \leq L ‖ x - y ‖ .$
Thus, $(2) ⟹ (1)$ .

(b) If $X$ is open, then we need to show that $(1) ⟺ (2)$ .

We have already shown that $(2) ⟹ (1)$ .
Assume that $X$ is open and $(1)$ holds.
Let $x \in X$ .
Since $x$ is an interior point of $dom f$ , hence the subdifferential is nonempty.
Pick any $g \in \partial f (x)$ .
Let $g^{†} \in V$ be a vector with $‖ g^{†} ‖ = 1$ and $⟨ g^{†}, g ⟩ = ‖ g ‖_{*}$ . Such a vector exists by definition of the dual norm.
Since $X$ is open, we can choose $ϵ > 0$ small enough such that $x + ϵ g^{†} \in X$ .
By the subgradient inequality, we have:

$f (x + ϵ g^{†}) \geq f (x) + ⟨ ϵ g^{†}, g ⟩ .$
Thus,

$\begin{aligned} ϵ ‖ g ‖_{*} & = ⟨ ϵ g^{†}, g ⟩ \\ \leq f (x + ϵ g^{†}) - f (x) \\ \leq L ‖ (x + ϵ g^{†} - x ‖ & by hypothesis in (1) \\ = L ϵ ‖ g^{†} ‖ = L ϵ . \end{aligned}$
Canceling $ϵ$ , we get:

$‖ g ‖_{*} \leq L$

holds true for every $g \in \partial f (x)$ where $x \in X$ as desired.

Corollary 8.25 (Lipschitz continuity of convex functions over compact domains)

Let $f : V \to (- \infty, \infty]$ be a proper and convex function. Suppose that $X \subseteq int dom f$ is compact. Then, there exists $L > 0$ such that

| f (x) - f (y) | \leq L ‖ x - y ‖ \forall x, y \in X .

Proof. Recall from Theorem 8.216 that the subgradients of a proper convex function over a compact set are nonempty and bounded.

In other words, the set

$Y = ⋃_{x \in X} \partial f (x)$

is nonempty and bounded.
Thus, for every $g \in Y$ , there exists $L > 0$ such that $‖ g ‖_{*} \leq L$ .
Then by Theorem 8.232,

$| f (x) - f (x) | \leq L ‖ x - y ‖ \forall x, y \in X .$
Thus, $f$ is indeed Lipschitz continuous over $X$ .

8.16.8. $ϵ$ -Subgradients¶

Definition 8.76 ( $ϵ$ -Subgradient)

Let $f : V \to (- \infty, \infty]$ be a proper function. Let $x \in dom f$ . A vector $g \in V^{*}$ is called an $ϵ$ -subgradient of $f$ at $x$ if

(8.9)¶

f (y) \geq f (x) + ⟨ y - x, g ⟩ - ϵ \forall y \in V .

8.16.8.1. Geometric Interpretation¶

Observation 8.12 ( $ϵ$ -subgradient and supporting hyperplane)

Let $f : V \to (- \infty, \infty]$ be a proper function. Then $g$ be an $ϵ$ -subgradient of $f$ at $x$ if and only if $epi f$ is contained in the positive halfspace of the hyperplane with a normal $(- g, 1)$ passing through $(x, f (x) - ϵ)$ .

Proof. Let $H$ denote the hyperplane

H = {(y, t) | ⟨ y, - g ⟩ + t = ⟨ x, - g ⟩ + f (x) - ϵ} .

The positive halfspace of $H$ is given by

H_{+} = {(y, t) | ⟨ y, - g ⟩ + t \geq ⟨ x, - g ⟩ + f (x) - ϵ} .

Assume that $g$ is an $ϵ$ -subgradient of $f$ at $x$ .

For any $(y, t) \in epi f$ , we have

$t \geq f (y) \geq f (x) + ⟨ y - x, g ⟩ - ϵ .$
This is equivalent to

$⟨ y, - g ⟩ + t \geq ⟨ x, - g ⟩ + f (x) - ϵ$

for all $(y, t) \in epi f$ .
Hence $epi f \subseteq H_{+}$ .

Now assume that $epi f \subseteq H_{+}$ .

Let $(y, f (y)) \in epi f$ .
Then we have

$⟨ y, - g ⟩ + f (y) \geq ⟨ x, - g ⟩ + f (x) - ϵ .$
Rearranging the terms, we have

$f (y) \geq f (x) + ⟨ y - x, g ⟩ - ϵ \forall y \in V .$
But this means that $g$ is an $ϵ$ -subgradient of $f$ at $x$ .

8.16.8.2. $ϵ$ -Subdifferential¶

Definition 8.77 ( $ϵ$ -subdifferential)

Let $f : V \to (- \infty, \infty]$ be a proper function. The set of all $ϵ$ -subgradients of $f$ at a point $x \in dom f$ is called the $ϵ$ -subdifferential of $f$ at $x$ and is denoted by $\partial_{ϵ} f (x)$ .

\partial_{ϵ} f (x) ≜ {g \in V^{*} | f (y) \geq f (x) + ⟨ y - x, g ⟩ - ϵ \forall y \in V} .

For all $x \notin dom f$ , we define $\partial_{ϵ} f (x) = \emptyset$ .

It is easy to see that

\partial f (x) \subseteq \partial_{ϵ} f (x) .

Also, if $ϵ_{2} \geq ϵ_{1} > 0$ , then

\partial_{ϵ_{1}} f (x) \subseteq \partial_{ϵ_{2}} f (x) .

8.16.9. Optimality Conditions¶

A well known result for differentiable functions is that at the point of optimality $\nabla f (x) = 0$ (see Theorem 7.1). Subdifferentials are useful in characterizing the minima of a function. The idea of vanishing gradients can be generalized for subgradients also.

Theorem 8.233 (Fermat’s optimality condition)

Let $f : V \to (- \infty, \infty]$ be a proper convex function. Then

a \in \arg \min {f (x) | x \in V}

if and only if $0 \in \partial f (a)$ .

In other words, $a$ is a minimizer of $f$ if and only if $0$ is a subgradient of $f$ at $a$ .

Proof. Assume that $0 \in \partial f (a)$ where $a \in dom f$ .

By subgradient inequality

$f (x) \geq f (a) + ⟨ x - a, 0 ⟩ \forall x \in V .$
This simplifies to

$f (x) \geq f (a) \forall x \in V .$
Thus, $a \in \arg \min {f (x) | x \in V}$ .

For the converse, assume that $a \in \arg \min {f (x) | x \in V}$ .

Then,

$f (x) \geq f (a) \forall x \in V .$
But then

$\begin{aligned} f (x) \geq f (a) \\ ⟺ f (x) \geq f (a) + 0 \\ ⟺ f (x) \geq f (a) + ⟨ x - a, 0 ⟩ \end{aligned}$

holds true for every $x \in V$ .
This implies that $0 \in \partial f (a)$ .

8.16.10. Mean Value Theorem¶

The following result is from [25].

Theorem 8.234 (A subgradients based mean value theorem for 1D functions)

Let $f : R \to (- \infty, \infty]$ be a proper closed convex function. Let $[a, b] \subseteq dom f$ with $a < b$ . Then,

f (b) - f (a) = \int_{a}^{b} h (t) d t

where $h : (a, b) \to R$ satisfies $h (t) \in \partial f (t)$ for every $t \in (a, b)$ .

In the reminder of this section, we compute the subgradients and subdifferential sets for a variety of standard functions.

8.16.11. Norm Functions¶

We recall from Theorem 8.210 that the subdifferential of a norm $‖ \cdot ‖ : V \to R$ at $x = 0$ is given by:

\partial f (0) = B_{‖ \cdot ‖_{*}} [0, 1] = {g \in V^{*} | ‖ g ‖_{*} \leq 1} .

8.16.11.1. $ℓ_{1}$ -Norm¶

Example 8.67 (Subdifferential of $ℓ_{1}$ norm at origin)

Let $f : R^{n} \to R$ be given by $f (x) = ‖ x ‖_{1}$ . We recall that the dual norm of $ℓ_{1}$ is $ℓ_{\infty}$ . The unit ball of $ℓ_{\infty}$ -norm at origin is given by

B_{‖ \cdot ‖_{\infty}} [0, 1] = [- 1, 1]^{n} .

Following Theorem 8.210, the subdifferential of $f$ at $x = 0$ is given by:

\partial f (0) = B_{‖ \cdot ‖_{\infty}} [0, 1] = [- 1, 1]^{n} .

Example 8.68 (Subdifferential of absolute value function at origin)

Let $g : R \to R$ be the absolute value function given by

g (x) = | x | .

This is a special case of $ℓ_{1}$ norm for $R^{1}$ . Thus, following Example 8.67, the subdifferential of $g$ at $x = 0$ is given by:

\partial g (0) = [- 1, 1] .

For a complete specification of the subdifferential of $g$ , see Example 8.73 below.

Example 8.69 (Subdifferential of $ℓ_{1}$ norm )

Let $f : R^{n} \to R$ be given by $f (x) = ‖ x ‖_{1}$ . We can write $f$ as a sum of $n$ functions

f (x) = ‖ x ‖_{1} = \sum_{i = 1}^{n} | x_{i} | = \sum_{i = 1}^{n} f_{i} (x)

where

f_{i} (x) = | x_{i} | .

Let $g (x) = | x |$ . Then

f_{i} (x) = | x_{i} | = | e_{i}^{T} x | = g (e_{i}^{T} x) .

Due to affine transformation rule (Theorem 8.227),

\partial f_{i} (x) = (\partial g (e_{i}^{T} x)) e_{i} = (\partial g (x_{i})) e_{i} .

The subdifferential of the absolute value function $g$ is described in Example 8.73 below.

Thus, the subdifferential set of $f_{i}$ is given by:

\begin{array}{r} \partial f_{i} (x) = {\begin{cases} {sgn (x_{i}) e_{i}} & for & x_{i} \neq 0 \\ [- e_{i}, e_{i}] & for & x_{i} = 0 \end{cases} . \end{array}

Using the sum rule Corollary 8.24, we have:

\sum_{i = 1}^{n} \partial f_{i} (x) = \partial (\sum_{i = 1}^{n} f_{i}) (x) .

We define the index set:

I_{0} (x) = {i | x_{i} = 0} .

Expanding the sum of subdifferentials,

\partial f (x) = \sum_{i \in I_{0} (x)} [- e_{i}, e_{i}] + \sum_{i \notin I_{0} (x)} sgn (x_{i}) e_{i} .

We can rewrite this as:

\partial f (x) = {z \in R^{n} | z_{i} = sgn (x_{i}) whenever x_{i} \neq 0, | z_{i} | \leq 1, otherwise} .

We also have a weak result from this:

sgn (x) \in \partial f (x) = \partial ‖ x ‖_{1} .

Example 8.70 (Subdifferential of $ℓ_{1}$ norm squared)

Let $f : R^{n} \to R$ be given by $f (x) = ‖ x ‖_{1}$ .

Now let $g (t) = [t]_{+}^{2}$ . And consider the function $h = g \circ f$ given by

h (x) = ‖ x ‖_{1}^{2} .

By subdifferential chain rule (Theorem 8.229):

\begin{aligned} \partial h (x) & = 2 ‖ x ‖_{1} \partial f (x) \\ = 2 ‖ x ‖_{1} {z \in R^{n} | z_{i} = sgn (x_{i}) whenever x_{i} \neq 0, | z_{i} | \leq 1, otherwise} . \end{aligned}

We have used the subdifferential of $f$ from Example 8.69.

8.16.11.2. $ℓ_{2}$ -Norm¶

Example 8.71 (Subdifferential of $ℓ_{2}$ norm at origin)

Let $f : R^{n} \to R$ be given by $f (x) = ‖ x ‖_{2}$ . We recall that the dual norm of $ℓ_{1}$ is also $ℓ_{2}$ as this norm is self dual.

Following Theorem 8.210, the subdifferential of $f$ at $x = 0$ is given by:

\partial f (0) = B_{‖ \cdot ‖_{2}} [0, 1] = {g \in R^{n} | ‖ g ‖_{2} \leq 1} .

Example 8.72 (Subdifferential of $ℓ_{2}$ norm)

Let $f : R^{n} \to R$ be given by $f (x) = ‖ x ‖_{2}$ . At $x \neq 0$ , $f$ is differentiable with the gradient (see Example 5.10):

\nabla f (x) = \frac{x}{‖ x ‖_{2}} .

Since $f$ is convex and differentiable at $x \neq 0$ , hence due to Theorem 8.220,

\partial f (x) = {\nabla f (x)} = {\frac{x}{‖ x ‖_{2}}} .

Combining this with the subdifferential of $f$ at origin from Example 8.71, we obtain:

\begin{array}{r} \partial f (x) = {\begin{cases} {\frac{x}{‖ x ‖_{2}}} & for & x \neq 0 \\ B_{‖ \cdot ‖_{2}} [0, 1] & for & x = 0 . \end{cases} \end{array}

Example 8.73 (Subdifferential of absolute value function)

Let $g : R \to R$ be the absolute value function given by

g (x) = | x | .

This is a special case of $ℓ_{2}$ norm for $R^{1}$ . Following Example 8.72,

\begin{array}{r} \partial g (x) = {\begin{cases} {sgn (x)} & for & x \neq 0 \\ [- 1, 1] & for & x = 0 . \end{cases} \end{array}

c.f. Example 8.68.

8.16.11.3. $ℓ_{\infty}$ -Norm¶

Example 8.74 (Subdifferential of $ℓ_{\infty}$ norm at origin)

Let $f : R^{n} \to R$ be given by $f (x) = ‖ x ‖_{\infty}$ . We recall that the dual norm of $ℓ_{\infty}$ is $ℓ_{1}$ . The unit ball of $ℓ_{1}$ -norm at origin is given by

B_{‖ \cdot ‖_{1}} [0, 1] = {x \in R^{n} | ‖ x ‖_{1} \leq 1} .

Following Theorem 8.210, the subdifferential of $f$ at $x = 0$ is given by:

\partial f (0) = B_{‖ \cdot ‖_{1} [0, 1]} = {x \in R^{n} | ‖ x ‖_{1} \leq 1} .

Example 8.75 (Subdifferential of $ℓ_{\infty}$ norm)

Let $f : R^{n} \to R$ be given by $f (x) = ‖ x ‖_{\infty}$ . Let us compute the subdifferential of $f$ at $x \neq 0$ .

We have:

f (x) = max {f_{1} (x), f_{2} (x), \dots, f_{n} (x)}

where $f_{i} (x) = | x_{i} |$ . We define:

I (x) = {i \in [1, \dots, n] | | x_{i} | = f (x) = ‖ x ‖_{\infty}} .

Then, following Example 8.69

\partial f_{i} (x) = {sgn (x_{i}) e_{i}} \forall i \in I (x) .

This is valid since $x \neq 0$ implies that $f (x) \neq 0$ which in turn implies that $x_{i} \neq 0$ for every $i \in I (x)$ .

Then, using the max rule for proper convex functions (Theorem 8.230):

\partial f (x) = conv (⋃_{i \in I (x)} {sgn (x_{i}) e_{i}}) .

We can rewrite this as:

\partial f (x) = {\sum_{i \in I (x)} λ_{i} sgn (x_{i}) e_{i} | \sum_{i \in I (x)} λ_{i} = 1, λ_{j} \geq 0, j \in I (x)} .

Combining this with the subdifferential of $f$ at origin from Example 8.74, we obtain:

\begin{array}{r} \partial f (x) = {\begin{cases} {\sum_{i \in I (x)} λ_{i} sgn (x_{i}) e_{i} | \sum_{i \in I (x)} λ_{i} = 1, λ_{j} \geq 0, j \in I (x)}, & x \neq 0 \\ B_{‖ \cdot ‖_{1}} [0, 1], & x = 0 . \end{cases} \end{array}

8.16.11.4. $ℓ_{1}$ Norm over Affine Transformation¶

Let $A \in R^{m \times n}$ . Let $b \in R^{m}$ . Let $f : R^{m} \to R$ be given by $f (y) = ‖ y ‖_{1}$ .

Let $h : R^{n} \to R$ be the function $h (x) = ‖ A x + b ‖_{1} = f (A x + b)$ .

By affine transformation rule, we have:

\partial h (x) = A^{T} \partial f (A x + b) \forall x \in R^{n} .

Denoting $i$ -th row of $A$ as $a_{i}^{T}$ , we define the index set:

I_{0} (x) = {i : a_{i}^{T} x_{i} + b_{i} = 0} .

we have:

\partial h (x) = \sum_{i \in I_{0} (x)} [- a_{i}, a_{i}] + \sum_{i \notin I_{0} (x)} sgn (a_{i}^{T} x + b_{i}) a_{i} .

In particular, we have the weak result:

A^{T} sgn (A x + b) \in \partial h (x) .

8.16.11.5. $ℓ_{2}$ Norm over Affine Transformation¶

Let $A \in R^{m \times n}$ . Let $b \in R^{m}$ . Let $f : R^{m} \to R$ be given by $f (y) = ‖ y ‖_{2}$ .

Let $h : R^{n} \to R$ be the function $h (x) = ‖ A x + b ‖_{2} = f (A x + b)$ .

We have:

\begin{array}{r} \partial f (A x + b) = {\begin{cases} {\frac{A x + b}{‖ A x + b ‖_{2}}} & for & A x + b \neq 0 \\ B_{‖ \cdot ‖_{2}} [0, 1] & for & A x + b = 0 \end{cases} . \end{array}

Applying the affine transformation rule, we get:

\begin{array}{r} \partial h (x) = A^{T} \partial f (A x + b) = {\begin{cases} {\frac{A^{T} (A x + b)}{‖ A x + b ‖_{2}}} & for & A x + b \neq 0 \\ A^{T} B_{‖ \cdot ‖_{2}} [0, 1] & for & A x + b = 0 \end{cases} . \end{array}

For $x | A x + b = 0$ , we can write this as

\partial h (x) = A^{T} B_{‖ \cdot ‖_{2}} [0, 1] = {A^{T} y | ‖ y ‖_{2} \leq 1} .

8.16.11.6. $ℓ_{\infty}$ Norm over Affine Transformation¶

Example 8.76 (Subdifferential of $| A x + b |_{\infty}$ )

Let $A \in R^{m \times n}$ . Let $b \in R^{m}$ . Let $f : R^{m} \to R$ be given by

f (y) = ‖ y ‖_{\infty} .

Let $h : R^{n} \to R$ be the function

h (x) = ‖ A x + b ‖_{\infty} = f (A x + b) .

With $y = A x + b$ , we have $y_{i} = a_{i}^{T} x + b_{i}$ where $a_{i}^{T}$ is the $i$ -th row vector of $A$ .

Following Example 8.75

\begin{array}{r} \partial f (y) = {\begin{cases} {\sum_{i \in I (y)} λ_{i} sgn (y_{i}) e_{i} | \sum_{i \in I (y)} λ_{i} = 1, λ_{j} \geq 0, j \in I (y)}, & y \neq 0 \\ B_{‖ \cdot ‖_{1}} [0, 1], & y = 0 \end{cases} \end{array}

where $I (y) = {i \in [1, \dots, n] | | y_{i} | = f (y) = ‖ y ‖_{\infty}}$ .

Due to affine transformation rule (Theorem 8.227),

\partial h (x) = A^{T} \partial f (A x + b) .

We have the following cases.

(a) $y = 0$ .

In terms of $x$ , the condition $y = 0$ is equivalent to $A x + b = 0$ .
Then,

$\partial f (A x + b) = \partial f (0) = B_{‖ \cdot ‖_{1}} [0, 1] .$
Thus,

$\partial h (x) = A^{T} B_{‖ \cdot ‖_{1}} [0, 1] .$

(b) $y \neq 0$ .

In terms of $x$ , the condition $y \neq 0$ is equivalent to $A x + b \neq 0$ .
Then,

$\begin{aligned} \partial f (A x + b) & = {\sum_{i \in I (y)} λ_{i} sgn (y_{i}) e_{i} | \sum_{i \in I (y)} λ_{i} = 1, λ_{j} \geq 0, j \in I (y)} \\ = {\sum_{i \in I_{x}} λ_{i} sgn (a_{i}^{T} x + b_{i}) e_{i} | \sum_{i \in I_{x}} λ_{i} = 1, λ_{j} \geq 0, j \in I_{x}} \end{aligned}$

where

$I_{x} = I (y) = I (A x + b) .$
Note that $A^{T} e_{i} = a_{i}$ .
Then,

$\begin{aligned} \partial h (x) & = A^{T} \partial f (A x + b) \\ = {\sum_{i \in I_{x}} λ_{i} sgn (a_{i}^{T} x + b_{i}) a_{i} | \sum_{i \in I_{x}} λ_{i} = 1, λ_{j} \geq 0, j \in I_{x}} . \end{aligned}$

Combining the two cases, we get:

\begin{array}{r} \partial h (x) = {\begin{cases} {\sum_{i \in I_{x}} λ_{i} sgn (a_{i}^{T} x + b_{i}) a_{i} | \sum_{i \in I_{x}} λ_{i} = 1, λ_{j} \geq 0, j \in I_{x}}, & A x + b \neq 0 \\ A^{T} B_{‖ \cdot ‖_{1}} [0, 1], & A x + b = 0 \end{cases} \end{array}

8.16.12. Indicator Functions¶

Theorem 8.235 (Subdifferential of indicator function)

The subdifferential of indicator function for a nonempty set $S \subset V$ at any point $x \in S$ is given by

\partial I_{S} (x) = N_{S} (x) .

where $N_{S} (x)$ is the normal cone of $S$ at $x$ .

Proof. Let $x \in S$ and $g \in \partial I_{S} (x)$ . The subgradient inequality (8.7) gives us:

\begin{aligned} I_{S} (z) \geq I_{S} (x) + ⟨ z - x, g ⟩ \forall z \in S \\ ⟺ 0 \geq 0 + ⟨ z - x, g ⟩ \forall z \in S \\ ⟺ ⟨ z - x, g ⟩ \leq 0 \forall z \in S \\ ⟺ g \in N_{S} (x) . \end{aligned}

Example 8.77 (Subdifferential of the indicator function of the unit ball)

The unit ball at origin is given by:

S = B [0, 1] = {x \in V | ‖ x ‖ \leq 1} .

From Theorem 8.64, the normal cone of $S$ at $x \in S$ is given by:

N_{S} (x) = {y \in V^{*} | ‖ y ‖_{*} \leq ⟨ x, y ⟩} .

For any $x \notin S$ , $N_{S} (x) = \emptyset$ . Combining:

\begin{array}{r} \partial δ_{B [0, 1]} (x) = {\begin{cases} {y \in V^{*} | ‖ y ‖_{*} \leq ⟨ x, y ⟩} & for & ‖ x ‖ \leq 1 \\ \emptyset & for & ‖ x ‖ > 1. \end{cases} \end{array}

8.16.13. Maximum Eigen Value Function¶

The maximum eigen value function for symmetric matrices, denoted as $f : S^{n} \to R$ , is given by:

f (X) ≜ λ_{max} (X) .

Theorem 8.236 (Subgradient for maximum eigen value function)

Let $f : S^{n} \to R$ be the maximum eigen value function. Then

v v^{T} \in \partial f (X)

where $v$ is a normalized eigen-vector of $X \in S^{n}$ associated with its maximum eigen value.

Proof. Let $X \in S^{n}$ be given. Let $v$ be a normalized eigen vector associated with the largest eigen value of $X$ . Then, $‖ v ‖_{2} = 1$ .

For any $Y \in S^{n}$ , we have:

\begin{aligned} f (Y) & = λ_{max} (Y) \\ = max_{‖ u ‖_{2} = 1} {u^{T} Y u} \\ \geq v^{T} Y v \\ = v^{T} X v + v^{T} (Y - X) v \\ = λ_{max} (X) ‖ v ‖_{2}^{2} + tr (v^{T} (Y - X) v) \\ = λ_{max} (X) + tr ((Y - X) v v^{T}) \\ = f (X) + ⟨ Y - X, v v^{T} ⟩ . \end{aligned}

In this derivation, we have used the following results:

The maximum eigen value can be obtained by maximizing $u^{T} Y u$ over the unit sphere.
For a scalar $x \in R$ , $x = tr (x)$ .
$tr (A B) = tr (B A)$ if both $A B$ and $B A$ are well defined.
$⟨ A, B ⟩ = tr (A B)$ for the space of symmetric matrices.

Thus, $v v^{T} \in \partial f (X)$ .

We note here that this result only identifies one of the subgradients of $f$ at $X$ . It doesn’t characterize the entire subdifferential of $f$ at $X$ . In this sense, this result is a weak result. In contrast, a strong result would characterize the entire subdifferential.

8.16.14. The Max Function¶

Example 8.78 (Subdifferential of the max function)

Let $f : R^{n} \to R$ be given by:

f (x) = max {x_{1}, x_{2}, \dots, x_{n}} .

Let $f_{i} (x) = x_{i} = e_{i}^{T} x$ . Then

f_{(} x) = max {f_{1} (x), f_{2} (x), \dots, f_{n} (x)} .

We note that $f_{i}$ are differentiable and their gradient is given by (see Example 5.4):

\nabla f_{i} (x) = e_{i} .

Also, $f_{i}$ are linear, hence convex. Thus, due to Theorem 8.220:

\partial f_{i} (x) = {e_{i}} .

We denote the index set of functions which equal the value of $f (x$ at $x$ by:

I (x) = {i | f (x) = x_{i}} .

Then, using the max rule for proper convex functions (Theorem 8.230):

\partial f (x) = conv (⋃_{i \in I (x)} \partial f_{i} (x)) = conv (⋃_{i \in I (x)} {e_{i}}) .

As and example, consider the case where $x = α 1$ for some $α \in R$ .

In other words, $x = (α, \dots, α)$ .
Then, $f (x) = α$ .
$f_{i} (x) = α = f (x)$ for ever $i \in [1, \dots, n]$ .
$I (x) = {1, \dots, n}$ .
$\nabla f_{i} (x) = e_{i}$ .
$conv (⋃_{i \in I (x)} {e_{i}}) = conv {e_{1}, \dots, e_{n}}$ .
But $conv {e_{1}, \dots, e_{n}} = Δ_{n}$ .
Thus,

$\partial f (α 1) = Δ_{n} \forall α \in R .$

8.16.15. Space of Matrices¶

Let $V = R^{m \times n}$ . Let the standard inner product for $x, y, \in V$ be $⟨ x, y ⟩ = tr (x^{T} y)$ .

Let $f : V \to (- \infty, \infty]$ be a proper function. Let $x \in int dom f$ .

The gradient at $x$ , if it exists, is given by:

\nabla f (x) = D_{f} (x) ≜ {(\frac{\partial f}{\partial x_{i j}} (x))}_{i, j} .

Let $H$ be a positive definite matrix and define an inner product for $V$ as:

⟨ x, y ⟩_{H} ≜ tr (x^{T} H y) .

Then

\nabla f (x) = H^{- 1} D_{f} (x) .

8.16.16. Convex Piecewise Linear Functions¶

Example 8.79 (Subdifferential of convex piecewise linear functions)

Let a convex piecewise linear function $f : R^{n} \to R$ be given by:

f (x) = max_{1 \leq i \leq m} {a_{i}^{T} x + b_{i}}

where $a_{i} \in R^{n}, b_{i} \in R$ for $i = 1, \dots, m$ .

We define a set of functions $f_{i} : R^{n} \to R$ for $i = 1, \dots, m$ as

f_{i} (x) = a_{i}^{T} x + b_{i}

We can see that $f$ is a pointwise maximum of these functions.

f (x) = max_{1 \leq i \leq m} {f_{i} (x)} .

Clearly,

\partial f_{i} (x) = {\nabla f_{i} (x)} = {a_{i}} .

We define:

I (x) = {i \in [1, \dots, m] | f (x) = f_{i} (x) = a_{i}^{T} x + b_{i}} .

Then, using the max rule for proper convex functions (Theorem 8.230):

\begin{aligned} \partial f (x) & = conv (⋃_{i \in I (x)} \partial f_{i} (x)) \\ = {\sum_{i \in I (x)} λ_{i} a_{i} | \sum_{i \in I (x)} λ_{i} = 1, λ_{j} \geq 0 \forall j \in I (x)} . \end{aligned}

By Fermat’s optimality condition (Theorem 8.233), $x^{*}$ is a minimizer of $f$ if and only if $0 \in f (x^{*})$ .

Thus, $x^{*}$ is a minimizer if and only if there exists $λ λ \in Δ_{m}$ such that

0 = \sum_{i = 1}^{m} λ_{i} a_{i}, λ_{j} = 0 \forall j \notin I (x^{*}) .

Note that at any $x$ , for every $j \notin I (x)$ , we have

a_{j}^{T} x + b_{j} - f (x) < 0.

Thus, the complimentary condition

λ_{j} (a_{j}^{T} x + b_{j} - f (x)) = 0, j = 1, \dots, m

denotes the fact that whenever $a_{j}^{T} x + b_{j} - f (x) < 0$ , then $λ_{j}$ must be zero and whenever $a_{j}^{T} x + b_{j} - f (x) = 0$ then $λ_{j} \geq 0$ is allowed (since $λ λ \in Δ_{m}$ ).

If we put together a matrix $A \in R^{m \times n}$ whose rows are $a_{1}^{T}, \dots, a_{m}^{T}$ , then the optimality condition can be succinctly stated as

\exists λ λ \in Δ_{m} s.t. A^{T} λ λ = 0 and λ_{j} (a_{j}^{T} x + b_{j} - f (x^{*})) = 0, j = 1, \dots, m .

8.16.17. Minimization Problems¶

Minimization problem:

min {f (x) | g (x) \leq 0, x \in X} .

Dual function:

q (λ) min_{x \in X} {L (x, λ) ≜ f (x) + λ^{T} g (x)}

Assume that for $λ = λ_{0}$ the minimization in R.H.S. is obtained at $x = x_{0}$ .

Subgradient of the (negative of the) dual function $- q$ :

- g (x_{0}) \in \partial (- q) (λ_{0}) .

Topics in Signal Processing

Subgradients

Contents

8.16. Subgradients¶

8.16.1. Subgradients¶

8.16.1.1. Geometric Interpretation¶

8.16.2. Subdifferential¶

8.16.2.1. Closedness and Convexity¶

8.16.2.2. Subdifferentiability and Convex Domain¶

8.16.2.3. Positive Scaling¶

8.16.3. Proper Convex Functions¶

8.16.3.1. Nonemptiness and Boundedness at Interior Points¶

8.16.3.2. Nonempty, Convex and Compact Subdifferentials¶

8.16.3.3. Subgradients over a Compact Set¶

8.16.3.4. Nonempty Subdifferential at Relative Interior Points¶

8.16.3.5. Unbounded Subdifferential¶

8.16.4. Directional Derivatives¶

8.16.4.1. Max Formula¶

8.16.5. Differentiability¶

8.16.5.1. Subdifferential and gradient¶

8.16.6. Subdifferential Calculus¶

8.16.6.1. Function Sums¶

8.16.6.2. Linear Transformations¶

8.16.6.3. Affine Transformations¶

8.16.6.4. Composition¶

8.16.6.5. Max Rule¶

8.16.7. Lipschitz Continuity¶

8.16.8. ϵ-Subgradients¶

8.16.8.1. Geometric Interpretation¶

8.16.8.2. ϵ-Subdifferential¶

8.16.9. Optimality Conditions¶

8.16.10. Mean Value Theorem¶

8.16.11. Norm Functions¶

8.16.11.1. ℓ1-Norm¶

8.16.11.2. ℓ2-Norm¶

8.16.11.3. ℓ∞-Norm¶

8.16.11.4. ℓ1 Norm over Affine Transformation¶

8.16.11.5. ℓ2 Norm over Affine Transformation¶

8.16.11.6. ℓ∞ Norm over Affine Transformation¶

8.16.12. Indicator Functions¶

8.16.13. Maximum Eigen Value Function¶

8.16.14. The Max Function¶

8.16.15. Space of Matrices¶

8.16.16. Convex Piecewise Linear Functions¶

8.16.17. Minimization Problems¶

8.16.8. $ϵ$ -Subgradients¶

8.16.8.2. $ϵ$ -Subdifferential¶

8.16.11.1. $ℓ_{1}$ -Norm¶

8.16.11.2. $ℓ_{2}$ -Norm¶

8.16.11.3. $ℓ_{\infty}$ -Norm¶

8.16.11.4. $ℓ_{1}$ Norm over Affine Transformation¶

8.16.11.5. $ℓ_{2}$ Norm over Affine Transformation¶

8.16.11.6. $ℓ_{\infty}$ Norm over Affine Transformation¶