8.9. Differentiability and Convex Functions¶

8.9.1. First Order Conditions¶

Let us look at the special case of real valued functions over $R^{n}$ which are differentiable.

Theorem 8.98 (First order characterization of convexity)

Let $f : R^{n} \to R$ be a real valued function which is differentiable at each point in $dom f$ which is open.

Then $f$ is convex if and only if $dom f$ is convex and

(8.4)¶

f (y) \geq f (x) + \nabla f (x)^{T} (y - x)

holds true for all $x, y \in dom f$ .

Proof. To prove (8.4), we first show that a differentiable real function $f : R \to R$ is convex if and only if

f (y) \geq f (x) + f^{'} (x) (y - x)

holds true for all $x, y \in dom f$ .

Assume that $f$ is convex. Hence, $dom f$ is convex too.

Let $x, y \in dom f$ .
Since $dom f$ is convex, hence $(1 - t) x + t y = x + t (y - x) \in dom f$ for all $t \in [0, 1]$ .
By convexity of $f$ , we have:

$f (x + t (y - x)) \leq (1 - t) f (x) + t f (y) .$
If we divide by $t$ on both sides, we obtain:

$f (y) \geq f (x) + \frac{f (x + t (y - x)) - f (x)}{t} .$
Taking the limit as $t \to 0^{+}$ , we obtain:

$f (y) \geq f (x) + f^{'} (x) (y - x) .$

For the converse, assume that $dom f$ is convex and

f (y) \geq f (x) + f^{'} (x) (y - x)

holds true for all $x, y \in dom f$ .

Recall that in $R$ the only convex sets are intervals. Thus, $dom f$ is an open interval.
Choose any $x, y \in dom f$ such that $x \neq y$ .
Choose $t \in [0, 1]$ .
Let $z = t x + (1 - t) y$ .
By hypothesis, we have:

$f (x) \geq f (z) + f^{'} (z) (x - z)$

and

$f (y) \geq f (z) + f^{'} (z) (y - z) .$
Multiplying the first inequality with $t$ and second with $(1 - t)$ and adding them yields:

$t f (x) + (1 - t) f (y) \geq f (z) = f (t x + (1 - t) y) .$
Thus, $f$ is convex.

We now prove for the general case with $f : R^{n} \to R$ . Recall from Theorem 8.70 that for any $x, y \in dom f$ the restriction of $f$ on the line passing through $x$ and $y$ is given by:

g (t) = f (t y + (1 - t) x) = f (x + t (y - x)) .

Note that, by chain rule (Example 5.11):

g^{'} (t) = \nabla f (t y + (1 - t) x)^{T} (y - x)

Assume $f$ is convex.

Let $x, y \in dom f$ such that $x \neq y$ .
Let $g$ be the restriction of $f$ on the line passing through $x, y$ as described above.
Due to Theorem 8.70, $g$ is convex.
By the argument for real functions above:

$g (t^{'}) \geq g (t) + g^{'} (t) (t^{'} - t)$

holds true for all $t, t^{'} \in dom g$ .
In particular, with $t^{'} = 1$ and $t = 0$ , we have:

$g (1) \geq g (0) + g^{'} (0) .$
But $g^{'} (0) = \nabla f (x)^{T} (y - x)$ .
Also, $g (1) = f (y)$ and $g (0) = f (x)$ .
Thus, we get:

$f (y) \geq f (x) + \nabla f (x)^{T} (y - x)$

as desired.

For the converse, assume that this inequality holds for all $x, y \in dom f$ and $dom f$ is convex.

Pick some $x, y \in dom f$ with $x \neq y$ .
Let $g$ be the restriction of $f$ on the line passing through $x, y$ as described above.
Pick $t_{1}, t_{2} \in dom g$ .
Then, $z_{1} = t_{1} y + (1 - t_{1}) x$ and $z_{2} = t_{2} y + (1 - t_{2}) x$ are in $dom f$ .
Consider $g (t_{1}) = f (t_{1} y + (1 - t_{1}) x) = f (z_{1})$ and $g (t_{2}) = f (t_{2} y + (1 - 2_{1}) x) = f (z_{2})$ .
Note that $g^{'} (t_{2}) = \nabla f (t_{2} y + (1 - t_{2}) x)^{T} (y - x) = \nabla f (z_{2})^{T} (y - x)$ .
By hypothesis, we have:

$f (z_{1}) \geq f (z_{2}) + \nabla f (z_{2})^{T} (z_{1} - z_{2}) .$
But $z_{1} - z_{2} = (t_{1} - t_{2}) (y - x)$ .
Thus, we get:

$g^{'} (t_{1}) \geq g^{'} (t_{2}) + g^{'} (t_{2}) (t_{1} - t_{2}) .$
This holds for every $t_{1}, t_{2} \in dom g$ .
But then, $g$ is convex by previous argument for real functions.
Since this is valid for every restriction of $f$ to a line passing through its domain, hence by Theorem 8.70 $f$ is convex.

8.9.2. Second Order Conditions¶

For functions which are twice differentiable, convexity can be expressed in terms of the positive-semidefiniteness of their Hessian matrices.

We start with a result on convexity of real functions on open intervals.

Theorem 8.99 (Convexity characterization for twice differentiable real functions on open intervals)

Let $f : R \to R$ be twice continuously differentiable on an open interval $(α, β)$ ; i.e., second derivative $f^{″}$ exists and is continuous at every point the open interval $(α, β)$ .

Then, $f$ is convex if and only if its second derivative $f^{″}$ is non-negative for every $x \in (α, β)$ :

f^{″} (x) \geq 0 \forall x \in (α, β) .

Proof. Assume that $f^{″}$ is nonnegative on $(α, β)$ .

Then, $f^{'}$ is nondecreasing on $(α, β)$ .
For any $x, y \in (α, β)$ with $x < y$ and $r \in (0, 1)$ , let $z = (1 - r) x + r y$ .
We have $z \in (x, y)$ ; i.e. $x < z < y$ . Consequently,

$\begin{aligned} f (z) - f (x) = \int_{x}^{z} f^{'} (t) d t \leq f^{'} (z) (z - x); \\ f (y) - f (z) = \int_{z}^{y} f^{'} (t) d t \geq f^{'} (z) (y - z) . \end{aligned}$
Since $z - x = r (y - x)$ and $y - z = (1 - r) (y - x)$ , we have

$\begin{array}{r} f (z) \leq f (x) + r f^{'} (z) (y - x); \\ f (z) \leq f (y) - (1 - r) f^{'} (z) (y - x) . \end{array}$

We wish to eliminate $f^{'} (z)$ from these inequalities.
Multiplying the two inequalities by $(1 - r)$ and $r$ respectively, and adding them together, we obtain:

$(1 - r) f (z) + r f (z) \leq (1 - r) f (x) + r f (y) .$
But $(1 - r) f (z) + r f (z) = f (z) = f ((1 - r) x + r y)$ .
Thus, $f ((1 - r) x + r y) \leq (1 - r) f (x) + r f (y)$ .
This inequality is valid for the case where $x > y$ also.
Thus, $f$ is convex over $(α, β)$ .

For the converse, assume that $f^{″}$ is not non-negative on $(α, β)$ .

Then, since $f^{″}$ is continuous in $(α, β)$ , hence $f^{″}$ is negative in some subinterval $(α^{'}, β^{'})$ .
Choose $x, y$ such that $α^{'} < x < y < β^{'}$ . Choose some $r \in (0, 1)$ .
Following an argument parallel to above, we have

$f ((1 - r) x + r y) > (1 - r) f (x) + r f (y) .$
Thus, there exist $x, y \in (α, β)$ where the inequality (8.1) is not valid.
Consequently, $f$ is non-convex.

We continue further with real valued functions over $R^{n}$ which are twice differentiable.

Theorem 8.100 (Second order characterization of convexity in Euclidean spaces)

Let $f : R^{n} \to R$ be twice continuously differentiable; i.e., its Hessian or second derivative $\nabla^{2} f$ exists at every point in $dom f$ which is open.

Then, $f$ is convex if and only if $dom f$ is convex and its Hessian is positive semidefinite for every $x \in dom f$ :

\nabla^{2} f (x) ⪰ O \forall x \in dom f .

Proof. The convexity of $f$ on its domain $C = dom f$ is equivalent to the convexity of the restriction of $f$ to each line segment in $C$ due to Theorem 8.70.

We first note that if $f$ is convex then $C$ is convex and if $C$ is not convex, then $f$ is not convex. So, for the rest of the argument, we shall assume that $C$ is convex.

Consequently, for any $y \in C$ and a nonzero $z \in R^{n}$ the intersection of the line ${x = y + t z | t \in R}$ and $C$ is an open line segment as $C$ is open and convex.

Let $y \in C$ .
Let $z \in R^{n}$ be an arbitrary (nonzero) direction.
Let $L = {x = y + t z | t \in R}$ be a line passing through $y$ in the direction $z$ .
Consider the open real interval $S = {t | y + t z \in C}$ . Since $L \cap C$ is an open line segment in $R^{n}$ , hence $S$ is indeed an open interval in $R$ .
Consider the parameterized restriction of $f$ on the open interval $S$ as:

$g (t) = f (y + t z), \forall t \in S .$
A simple calculation shows that

$g^{″} (t) = ⟨ z, \nabla^{2} f (x) z ⟩$

where $x = y + t z$ .
By Theorem 8.99, $g$ is convex for each $y \in C$ and nonzero $z \in R^{n}$ if and only if $⟨ z, \nabla^{2} f (x) z ⟩ \geq 0$ for every $z \in R^{n}$ and $x \in C$ .
Thus, $f$ is convex if and only if $\nabla^{2} f (x) ⪰ O \forall x \in C$ .

For real functions, the Hessian is simply the second derivative $f^{″}$ .

Corollary 8.6 (Second order characterization of concavity)

Let $f : R^{n} \to R$ be twice continuously differentiable; i.e., its Hessian or second derivative $\nabla^{2} f$ exists at every point in $dom f$ which is open.

Then, $f$ is concave if and only if $dom f$ is convex and its Hessian is negative semidefinite for every $x \in dom f$ :

\nabla^{2} f (x) ⪯ O \forall x \in dom f .

Example 8.40 (Convexity of a quadratic function)

Let $P \in S^{n}$ be a symmetric matrix. Let $q \in R^{n}$ and $r \in R$ . Consider the quadratic functional $f : R^{n} \to R$ given as:

f (x) = \frac{1}{2} x^{T} P x + q^{T} x + r .

As shown in Example 5.13, the Hessian of $f$ is:

\nabla^{2} f (x) = P \forall x \in R^{n} .

Thus, $f$ is convex if and only if $P ⪰ O$ (i.e., it is positive semidefinite).

In fact $f$ is strictly convex if and only if $P succ O$ .

Example 8.41 (Identity is convex and concave)

Let $f : R \to R$ be:

f (x) = x .

We have $f^{'} (x) = 1$ and $f^{″} (x) = 0$ .

$f$ is both convex and concave.

Example 8.42 (Exponential is convex)

Let $f : R \to R$ be:

f (x) = e^{a x}

with $dom f = R$ .

We have $f^{'} (x) = a e^{a x}$ and $f^{″} (x) = a^{2} e^{a x}$ .

For any $a, x \in R$ , $a^{2} e^{a x} > 0$ . Thus, $f$ is strictly convex.

Example 8.43 (Powers)

Let $f : R \to R$ be:

f (x) = x^{a}

with $dom f = R_{+ +}$ .

Now, $f^{'} (x) = a x^{a - 1}$ and $f^{″} (x) = a (a - 1) x^{a - 2}$ .

We have $x > 0$ .
For $a \geq 1$ , $f^{″} (x) \geq 0$ . $f$ is convex for $a \geq 1$ .
For $a \leq 0$ , $a (a - 1) \geq 0$ . Thus, $f^{″} (x) \geq 0$ . $f$ is convex for $a \leq 0$ .
For $0 \leq a \leq 1$ , $a (a - 1) \leq 0$ . Thus, $f^{″} (x) \leq 0$ . $f$ is concave on $0 \leq a \leq 1$ .

Example 8.44 (Reciprocal powers)

Let $f : R \to R$ be:

f (x) = \frac{1}{x^{r}} = x^{- r} .

with $dom f = R_{+ +}$ .

Now, $f^{'} (x) = (- r) x^{- r - 1}$ and $f^{″} (x) = (- r) (- r - 1) x^{- r - 2} = r (r + 1) x^{- (r + 2)}$ .

We have $x > 0$ .
For $r \geq 0$ , $f^{″} (x) \geq 0$ . $f$ is convex for $r \geq 0$ .

Example 8.45 (Logarithm is concave)

Let $f : R \to R$ be:

f (x) = \ln x .

with $dom f = R_{+ +}$ .

Now, $f^{'} (x) = \frac{1}{x}$ and $f^{″} (x) = \frac{- 1}{x^{2}}$ .

$f^{″} (x) < 0$ for all $x > 0$ .
Thus, $f$ is concave for all $x > 0$ .

Example 8.46 (Negative entropy is convex)

Let $f : R \to R$ be:

f (x) = x \ln x .

with $dom f = R_{+ +}$ .

Now, $f^{'} (x) = \ln x + 1$ and $f^{″} (x) = \frac{1}{x}$ .

$f^{″} (x) > 0$ for all $x > 0$ .
Thus, $f$ is convex for all $x > 0$ .

Example 8.47 (Quadratic over linear form is convex)

Let $f : R \times R \to R$ be given by:

f (x, y) = \frac{x^{2}}{y}

with $dom f = {(x, y) | y > 0}$ .

From Example 5.16, the Hessian is:

\begin{array}{r} \nabla^{2} f (x, y) = \frac{2}{y^{3}} [\begin{array}{c} y^{2} & - x y \\ - x y & x^{2} \end{array}] = \frac{2}{y^{3}} [\begin{array}{c} y \\ - x \end{array}] {[\begin{array}{c} y \\ - x \end{array}]}^{T} . \end{array}

Recall that for any $x \in R^{n}$ , the matrix $x x^{T}$ is positive semi-definite. Hence,

\begin{array}{r} [\begin{array}{c} y \\ - x \end{array}] {[\begin{array}{c} y \\ - x \end{array}]}^{T} \end{array}

is positive semi-definite.

For $y > 0$ , $\frac{2}{y^{3}} > 0$ . Combining:

\nabla^{2} f (x, y) ⪰ O .

Thus, $f$ is convex.

Example 8.48 (Log sum exponential is convex)

Let $f : R^{n} \to R$ be given by:

f (x) = \ln (\sum_{i = 1}^{n} e^{x_{i}})

with $dom f = R^{n}$ .

From Example 5.14, we have

\nabla^{2} f (x) = \frac{1}{(1^{T} z)^{2}} ((1^{T} z) diag (z) - z z^{T})

where

\begin{array}{r} z = [\begin{array}{c} e^{x_{1}} \\ ⋮ \\ e^{x_{n}} \end{array}] . \end{array}

To show that $\nabla^{2} f (x)$ is p.s.d., it suffices to show that $(1^{T} z) diag (z) - z z^{T}$ is p.s.d..

Now for any $v \in R^{n}$ .

\begin{aligned} v^{T} ((1^{T} z) diag (z) - z z^{T}) v \\ = (1^{T} z) (v^{T} diag (z) v) - v^{T} z z^{T} v \\ = (1^{T} z) (v^{T} diag (z) v) - (v^{T} z)^{2} \\ = (\sum_{i = 1}^{n} z_{i}) (\sum_{i = 1}^{n} v_{i}^{2} z_{i}) - {(\sum_{i = 1}^{n} v_{i} z_{i})}^{2} . \end{aligned}

If we define vectors $a$ and $b$ with $a_{i} = v_{i} \sqrt{z_{i}}$ and $b_{i} = \sqrt{z_{i}}$ , then by Cauchy-Schwartz inequality , we have:

(a^{T} a) (b^{T} b) \geq (a^{T} b)^{2} ⟺ (a^{T} a) (b^{T} b) - (a^{T} b)^{2} \geq 0.

But this is exactly the expression above. Thus, $\nabla^{2} f (x) ⪰ O$ .

Hence, $f$ is convex.

Example 8.49 (Log determinant function is concave)

Let $f : S^{n} \to R$ be:

f (X) = \log det X .

with $dom f = S_{+ +}^{n}$ (the set of symmetric positive definite matrices).

Let any line in $S^{n}$ be given by:

X = Z + t V

where $Z, V \in S^{n}$ .

Consider the restriction of $f$ on a line:

g (t) = \log det (Z + t V)

to the interval of values where $Z + t V succ O$ (since $dom f = S_{+ +}^{n}$ ). In other words,

dom g = {t \in R | Z + t V succ O} .

Without any loss of generality, we can assume that $t = 0 \in dom g$ ; i.e. $Z succ O$ .

Recall that:

$det (A B) = det (A) det (B)$ for square matrices.
$det (A) = \prod_{i = 1}^{n} λ_{i}$ for symmetric matrices with $λ_{i}$ being their eigen values.
If $λ_{i}$ are eigen values of $A$ , then the eigen values of $I + t A$ are $1 + t λ_{i}$ .

Now

\begin{aligned} g (t) & = \log det (Z + t V) \\ = \log det (Z^{\frac{1}{2}} (Z^{\frac{1}{2}} + t Z^{- \frac{1}{2}} V)) \\ = \log det (Z^{\frac{1}{2}} (I + t Z^{- \frac{1}{2}} V Z^{- \frac{1}{2}}) Z^{\frac{1}{2}}) \\ = \log det (Z^{\frac{1}{2}}) + \log det (I + t Z^{- \frac{1}{2}} V Z^{- \frac{1}{2}}) + \log det (Z^{\frac{1}{2}}) \\ = \log det (Z) + \log det (I + t Z^{- \frac{1}{2}} V Z^{- \frac{1}{2}}) . \end{aligned}

Let $λ_{i}$ be the eigen values of $Z^{- \frac{1}{2}} V Z^{- \frac{1}{2}}$ .
Then, $1 + t λ_{i}$ are eigen values of $I + t Z^{- \frac{1}{2}} V Z^{- \frac{1}{2}}$ .
Thus, $\log det (I + t Z^{- \frac{1}{2}} V Z^{- \frac{1}{2}}) = \sum_{i = 1}^{n} \log det (1 + t λ_{i})$ .

Thus,

g (t) = \sum_{i = 1}^{n} \log det (1 + t λ_{i}) + \log det (Z) .

Note that $\log det (Z)$ doesn’t depend on $t$ . Similarly, $λ_{i}$ only depend on $Z$ and $V$ , hence they don’t depend on $t$ .

Differentiating $g$ w.r.t. $t$ , we get:

g^{'} (t) = \sum_{i = 1}^{n} \frac{λ_{i}}{1 + t λ_{i}} .

Differentiating again, we get:

g^{″} (t) = - \sum_{i = 1}^{n} \frac{λ_{i}^{2}}{(1 + t λ_{i})^{2}} .

Since $g^{″} (t) \leq 0$ , hence $f$ is concave.

Topics in Signal Processing

Differentiability and Convex Functions

Contents

8.9. Differentiability and Convex Functions¶

8.9.1. First Order Conditions¶

8.9.2. Second Order Conditions¶