Differentiability and Convex Functions

8.9. Differentiability and Convex Functions

8.9.1. First Order Conditions

Let us look at the special case of real valued functions over Rn which are differentiable.

Theorem 8.98 (First order characterization of convexity)

Let f:RnR be a real valued function which is differentiable at each point in domf which is open.

Then f is convex if and only if domf is convex and

(8.4)f(y)f(x)+f(x)T(yx)

holds true for all x,ydomf.

Proof. To prove (8.4), we first show that a differentiable real function f:RR is convex if and only if

f(y)f(x)+f(x)(yx)

holds true for all x,ydomf.

Assume that f is convex. Hence, domf is convex too.

  1. Let x,ydomf.

  2. Since domf is convex, hence (1t)x+ty=x+t(yx)domf for all t[0,1].

  3. By convexity of f, we have:

    f(x+t(yx))(1t)f(x)+tf(y).
  4. If we divide by t on both sides, we obtain:

    f(y)f(x)+f(x+t(yx))f(x)t.
  5. Taking the limit as t0+, we obtain:

    f(y)f(x)+f(x)(yx).

For the converse, assume that domf is convex and

f(y)f(x)+f(x)(yx)

holds true for all x,ydomf.

  1. Recall that in R the only convex sets are intervals. Thus, domf is an open interval.

  2. Choose any x,ydomf such that xy.

  3. Choose t[0,1].

  4. Let z=tx+(1t)y.

  5. By hypothesis, we have:

    f(x)f(z)+f(z)(xz)

    and

    f(y)f(z)+f(z)(yz).
  6. Multiplying the first inequality with t and second with (1t) and adding them yields:

    tf(x)+(1t)f(y)f(z)=f(tx+(1t)y).
  7. Thus, f is convex.

We now prove for the general case with f:RnR. Recall from Theorem 8.70 that for any x,ydomf the restriction of f on the line passing through x and y is given by:

g(t)=f(ty+(1t)x)=f(x+t(yx)).

Note that, by chain rule (Example 5.11):

g(t)=f(ty+(1t)x)T(yx)

Assume f is convex.

  1. Let x,ydomf such that xy.

  2. Let g be the restriction of f on the line passing through x,y as described above.

  3. Due to Theorem 8.70, g is convex.

  4. By the argument for real functions above:

    g(t)g(t)+g(t)(tt)

    holds true for all t,tdomg.

  5. In particular, with t=1 and t=0, we have:

    g(1)g(0)+g(0).
  6. But g(0)=f(x)T(yx).

  7. Also, g(1)=f(y) and g(0)=f(x).

  8. Thus, we get:

    f(y)f(x)+f(x)T(yx)

    as desired.

For the converse, assume that this inequality holds for all x,ydomf and domf is convex.

  1. Pick some x,ydomf with xy.

  2. Let g be the restriction of f on the line passing through x,y as described above.

  3. Pick t1,t2domg.

  4. Then, z1=t1y+(1t1)x and z2=t2y+(1t2)x are in domf.

  5. Consider g(t1)=f(t1y+(1t1)x)=f(z1) and g(t2)=f(t2y+(121)x)=f(z2).

  6. Note that g(t2)=f(t2y+(1t2)x)T(yx)=f(z2)T(yx).

  7. By hypothesis, we have:

    f(z1)f(z2)+f(z2)T(z1z2).
  8. But z1z2=(t1t2)(yx).

  9. Thus, we get:

    g(t1)g(t2)+g(t2)(t1t2).
  10. This holds for every t1,t2domg.

  11. But then, g is convex by previous argument for real functions.

  12. Since this is valid for every restriction of f to a line passing through its domain, hence by Theorem 8.70 f is convex.

8.9.2. Second Order Conditions

For functions which are twice differentiable, convexity can be expressed in terms of the positive-semidefiniteness of their Hessian matrices.

We start with a result on convexity of real functions on open intervals.

Theorem 8.99 (Convexity characterization for twice differentiable real functions on open intervals)

Let f:RR be twice continuously differentiable on an open interval (α,β); i.e., second derivative f exists and is continuous at every point the open interval (α,β).

Then, f is convex if and only if its second derivative f is non-negative for every x(α,β):

f(x)0x(α,β).

Proof. Assume that f is nonnegative on (α,β).

  1. Then, f is nondecreasing on (α,β).

  2. For any x,y(α,β) with x<y and r(0,1), let z=(1r)x+ry.

  3. We have z(x,y); i.e. x<z<y. Consequently,

    f(z)f(x)=xzf(t)dtf(z)(zx);f(y)f(z)=zyf(t)dtf(z)(yz).
  4. Since zx=r(yx) and yz=(1r)(yx), we have

    f(z)f(x)+rf(z)(yx);f(z)f(y)(1r)f(z)(yx).

    We wish to eliminate f(z) from these inequalities.

  5. Multiplying the two inequalities by (1r) and r respectively, and adding them together, we obtain:

    (1r)f(z)+rf(z)(1r)f(x)+rf(y).
  6. But (1r)f(z)+rf(z)=f(z)=f((1r)x+ry).

  7. Thus, f((1r)x+ry)(1r)f(x)+rf(y).

  8. This inequality is valid for the case where x>y also.

  9. Thus, f is convex over (α,β).

For the converse, assume that f is not non-negative on (α,β).

  1. Then, since f is continuous in (α,β), hence f is negative in some subinterval (α,β).

  2. Choose x,y such that α<x<y<β. Choose some r(0,1).

  3. Following an argument parallel to above, we have

    f((1r)x+ry)>(1r)f(x)+rf(y).
  4. Thus, there exist x,y(α,β) where the inequality (8.1) is not valid.

  5. Consequently, f is non-convex.

We continue further with real valued functions over Rn which are twice differentiable.

Theorem 8.100 (Second order characterization of convexity in Euclidean spaces)

Let f:RnR be twice continuously differentiable; i.e., its Hessian or second derivative 2f exists at every point in domf which is open.

Then, f is convex if and only if domf is convex and its Hessian is positive semidefinite for every xdomf:

2f(x)Oxdomf.

Proof. The convexity of f on its domain C=domf is equivalent to the convexity of the restriction of f to each line segment in C due to Theorem 8.70.

We first note that if f is convex then C is convex and if C is not convex, then f is not convex. So, for the rest of the argument, we shall assume that C is convex.

Consequently, for any yC and a nonzero zRn the intersection of the line {x=y+tz|tR} and C is an open line segment as C is open and convex.

  1. Let yC.

  2. Let zRn be an arbitrary (nonzero) direction.

  3. Let L={x=y+tz|tR} be a line passing through y in the direction z.

  4. Consider the open real interval S={t|y+tzC}. Since LC is an open line segment in Rn, hence S is indeed an open interval in R.

  5. Consider the parameterized restriction of f on the open interval S as:

    g(t)=f(y+tz),tS.
  6. A simple calculation shows that

    g(t)=z,2f(x)z

    where x=y+tz.

  7. By Theorem 8.99, g is convex for each yC and nonzero zRn if and only if z,2f(x)z0 for every zRn and xC.

  8. Thus, f is convex if and only if 2f(x)OxC.

For real functions, the Hessian is simply the second derivative f.

Corollary 8.6 (Second order characterization of concavity)

Let f:RnR be twice continuously differentiable; i.e., its Hessian or second derivative 2f exists at every point in domf which is open.

Then, f is concave if and only if domf is convex and its Hessian is negative semidefinite for every xdomf:

2f(x)Oxdomf.

Example 8.40 (Convexity of a quadratic function)

Let PSn be a symmetric matrix. Let qRn and rR. Consider the quadratic functional f:RnR given as:

f(x)=12xTPx+qTx+r.

As shown in Example 5.13, the Hessian of f is:

2f(x)=PxRn.

Thus, f is convex if and only if PO (i.e., it is positive semidefinite).

In fact f is strictly convex if and only if PsuccO.

Example 8.41 (Identity is convex and concave)

Let f:RR be:

f(x)=x.

We have f(x)=1 and f(x)=0.

f is both convex and concave.

Example 8.42 (Exponential is convex)

Let f:RR be:

f(x)=eax

with domf=R.

We have f(x)=aeax and f(x)=a2eax.

For any a,xR, a2eax>0. Thus, f is strictly convex.

Example 8.43 (Powers)

Let f:RR be:

f(x)=xa

with domf=R++.

Now, f(x)=axa1 and f(x)=a(a1)xa2.

  1. We have x>0.

  2. For a1, f(x)0. f is convex for a1.

  3. For a0, a(a1)0. Thus, f(x)0. f is convex for a0.

  4. For 0a1, a(a1)0. Thus, f(x)0. f is concave on 0a1.

Example 8.44 (Reciprocal powers)

Let f:RR be:

f(x)=1xr=xr.

with domf=R++.

Now, f(x)=(r)xr1 and f(x)=(r)(r1)xr2=r(r+1)x(r+2).

  1. We have x>0.

  2. For r0, f(x)0. f is convex for r0.

Example 8.45 (Logarithm is concave)

Let f:RR be:

f(x)=lnx.

with domf=R++.

Now, f(x)=1x and f(x)=1x2.

  1. f(x)<0 for all x>0.

  2. Thus, f is concave for all x>0.

Example 8.46 (Negative entropy is convex)

Let f:RR be:

f(x)=xlnx.

with domf=R++.

Now, f(x)=lnx+1 and f(x)=1x.

  1. f(x)>0 for all x>0.

  2. Thus, f is convex for all x>0.

Example 8.47 (Quadratic over linear form is convex)

Let f:R×RR be given by:

f(x,y)=x2y

with domf={(x,y)|y>0}.

From Example 5.16, the Hessian is:

2f(x,y)=2y3[y2xyxyx2]=2y3[yx][yx]T.

Recall that for any xRn, the matrix xxT is positive semi-definite. Hence,

[yx][yx]T

is positive semi-definite.

For y>0, 2y3>0. Combining:

2f(x,y)O.

Thus, f is convex.

Example 8.48 (Log sum exponential is convex)

Let f:RnR be given by:

f(x)=ln(i=1nexi)

with domf=Rn.

From Example 5.14, we have

2f(x)=1(1Tz)2((1Tz)diag(z)zzT)

where

z=[ex1exn].

To show that 2f(x) is p.s.d., it suffices to show that (1Tz)diag(z)zzT is p.s.d..

Now for any vRn.

vT((1Tz)diag(z)zzT)v=(1Tz)(vTdiag(z)v)vTzzTv=(1Tz)(vTdiag(z)v)(vTz)2=(i=1nzi)(i=1nvi2zi)(i=1nvizi)2.

If we define vectors a and b with ai=vizi and bi=zi, then by Cauchy-Schwartz inequality , we have:

(aTa)(bTb)(aTb)2(aTa)(bTb)(aTb)20.

But this is exactly the expression above. Thus, 2f(x)O.

Hence, f is convex.

Example 8.49 (Log determinant function is concave)

Let f:SnR be:

f(X)=logdetX.

with domf=S++n (the set of symmetric positive definite matrices).

Let any line in Sn be given by:

X=Z+tV

where Z,VSn.

Consider the restriction of f on a line:

g(t)=logdet(Z+tV)

to the interval of values where Z+tVsuccO (since domf=S++n ). In other words,

domg={tR|Z+tVsuccO}.

Without any loss of generality, we can assume that t=0domg; i.e. ZsuccO.

Recall that:

  1. det(AB)=det(A)det(B) for square matrices.

  2. det(A)=i=1nλi for symmetric matrices with λi being their eigen values.

  3. If λi are eigen values of A, then the eigen values of I+tA are 1+tλi.

Now

g(t)=logdet(Z+tV)=logdet(Z12(Z12+tZ12V))=logdet(Z12(I+tZ12VZ12)Z12)=logdet(Z12)+logdet(I+tZ12VZ12)+logdet(Z12)=logdet(Z)+logdet(I+tZ12VZ12).
  1. Let λi be the eigen values of Z12VZ12.

  2. Then, 1+tλi are eigen values of I+tZ12VZ12.

  3. Thus, logdet(I+tZ12VZ12)=i=1nlogdet(1+tλi).

Thus,

g(t)=i=1nlogdet(1+tλi)+logdet(Z).

Note that logdet(Z) doesn’t depend on t. Similarly, λi only depend on Z and V, hence they don’t depend on t.

Differentiating g w.r.t. t, we get:

g(t)=i=1nλi1+tλi.

Differentiating again, we get:

g(t)=i=1nλi2(1+tλi)2.

Since g(t)0, hence f is concave.