7.1. Mathematical OptimizationΒΆ

Main references for this section are [7, 10].

This section provides a general overview of optimization problems. The notion of objective or cost functions and constraint functions is introduced. Standard form for optimization problems is introduced. Several examples are discussed for equivalent forms of optimization problems. The standard form focuses on minimizing the objective function over a set of equality and inequality constraints. Maximization problems can also be converted into minimization problems by changing the sign of the objective function.

We introduce the notion of the domain of an optimization problem, feasible points or solutions, feasible set of solutions, the optimal value of the minimization problem, optimal points or solutions, optimal set etc.. We discuss when the problem may be infeasible or feasible but unbounded. We discuss situations where the problem may be feasible and have a global minimum/optimal value and yet a feasible solution may not exist.

We then discuss general requirements for the feasibility of the optimization problem and existence of an optimal solution. We introduce the notion of coercive functions which provide the Weierstrass type result for the existence of optimal solutions.

7.1.1. Optimization ProblemsΒΆ

Let V be a real n-dimensional inner product space.

In the most general form, an optimization problem can be considered as minimizing or maximizing the value of a real valued function f:V→R over its domain S=domf.

  1. We call f as the objective function which is being minimized or maximized.

  2. The variable x∈V is called the optimization variable.

  3. S=domf is called the feasible set of solutions for the optimization problem.

  4. A point x∈S is called a feasible point or feasible solution.

  5. Our goal is to find an xβˆ—βˆˆS which maximizes or minimizes the objective function f.

Definition 7.1 (Globally optimal value and points)

Let f:V→R be a real valued function with S=domf.

  1. The maximum value of f is given by

    sup{f(x)|x∈S}.
  2. The minimum value of f is given by

    inf{f(x)|x∈S}.
  3. We allow the maximum and minimum values to take ∞ and βˆ’βˆž values.

  4. The function f may or may not attain its maximum / minimum value at some point in its domain S.

  5. xβˆ—βˆˆS is called a global minimum point if f(x)β‰₯f(xβˆ—) for every x∈S.

  6. xβˆ—βˆˆS is called a global maximum point if f(x)≀f(xβˆ—) for every x∈S.

  7. xβˆ—βˆˆS is called a strict global minimum point if f(x)>f(xβˆ—) for every x∈S with xβ‰ xβˆ—.

  8. xβˆ—βˆˆS is called a strict global minimum point if f(x)<f(xβˆ—) for every x∈S with xβ‰ xβˆ—.

  9. A point xβˆ—βˆˆS is called a global optimal point if it is either a global maximum or minimum point.

  10. The maximum and minimum values of f are always unique as opposed to global optimal points which may not be unique.

  11. The set of global minimizers is given by

    argmin{f(x)|x∈S}.
  12. The set of global maximizers is given by

    argmax{f(x)|x∈S}.

In a more restricted setting, an optimization problem can be considered as minimizing or maximizing the value of a real valued function f:Vβ†’R with S=domf over a set A such that AβŠ†S.

  1. In this case, the feasible set is restricted to A.

  2. Minimizers and maximizers are searched within this subset A.

  3. This problem can be converted into the earlier general form by considering the restriction f~:V→R of f such that A=domf~ and

    f~(x)=f(x)βˆ€x∈A.

In several problems, it may not be possible to establish whether a point is globally optimal. However, it is easier to establish if a point is optimal in a neighborhood around it. Such points are called locally optimal points or extreme values.

Definition 7.2 (Local optimal points)

Let f:Vβ†’R be a real valued function with S=domf. We say that f(a) is a local extreme value or local optimal value of f at a∈domf if there exists Ξ΄>0 such that f(x)βˆ’f(a) doesn’t change sign on B(a,Ξ΄)∩S.

More specifically,

  1. f(a) is a local maximum value of f if for some Ξ΄>0:

    f(x)≀f(a)βˆ€x∈B(a,Ξ΄)∩S.
  2. f(a) is a local minimum value of f if for some Ξ΄>0:

    f(x)β‰₯f(a)βˆ€x∈B(a,Ξ΄)∩S.

The point x=a is called a local extreme point or local optimal point of f or more specifically, a local maximum or a local minimum point of f.

  1. a is a strict local maximum point if for some Ξ΄>0:

    f(x)>f(a)βˆ€x∈Bd(a,Ξ΄)∩S.
  2. a is a strict local minimum point of f if for some Ξ΄>0:

    f(x)>f(a)βˆ€x∈Bd(a,Ξ΄)∩S.

Here Bd(a,Ξ΄) denotes the deleted neighborhood (an open ball of radius Ξ΄ excluding a itself).

Remark 7.1 (Local optimal points for open domain)

Let f:V→R be a real valued function with S=domf. Assume that S is open.

  1. If a∈S is a local maximum point, then there exists r1>0 such that

    f(x)≀f(a)βˆ€x∈B(a,r1)∩S.
  2. But since S is open, hence a∈intS. Thus, there exists an open ball B(a,r2)βŠ†S.

  3. Let r=min(r1,r2).

  4. Then, B(a,r)βŠ†S and B(a,r)βŠ†B(a,r1).

  5. Thus, B(a,r)βŠ†B(a,r1)∩S.

  6. Thus, f(x)≀f(a)βˆ€x∈B(a,r)βŠ†S.

  7. Thus, the local optimality condition simplifies to looking for an open ball of radius r totally contained inside the open domain S.

Based on this discussion, we can adjust the conditions for local optimality.

  1. a is a local maximum point of f if for some r>0:

    f(x)≀f(a)βˆ€x∈B(a,r)βŠ†S.
  2. a is a local minimum point of f if for some r>0:

    f(x)β‰₯f(a)βˆ€x∈B(a,r)βŠ†S.
  3. a is a strict local maximum point if for some r>0:

    f(x)>f(a)βˆ€x∈Bd(a,r)βŠ†S.
  4. a is a strict local minimum point of f if for some r>0:

    f(x)>f(a)βˆ€x∈Bd(a,r)βŠ†S.

7.1.1.1. Standard Form for Mathematical Optimization ProblemsΒΆ

Definition 7.3 (Optimization problem standard form)

Let V be an n-dimensional real vector space. A mathematical optimization problem of the form

(7.1)ΒΆminimize f0(x)subject to fi(x)≀0,i=1,…,mhj(x)=0,j=1,…,p

is known as an optimization problem in its standard form.

  1. x∈V is called the optimization variable.

  2. f0:V→R is called the objective function or cost function.

  3. The functions fi:V→R are called the inequality constraint functions.

  4. The corresponding inequalities fi(x)≀0 are called the inequality constraints.

  5. The functions hj:V→R are called the equality constraint functions.

  6. The corresponding equations hj(x)=0 are called the equality constraints.

  7. If there are no constraints (m=0,p=0), the problem is called unconstrained.

  8. The set

    Dβ‰œdomf0βˆ©β‹‚i=1mdomfiβˆ©β‹‚j=1pdomhj

    is called the domain of the optimization problem.

  9. A point x∈D is called feasible if it satisfies all the inequality constraints fi(x)≀0 and all the equality constraints hi(x)=0.

  10. The optimization problem is called feasible if there exists at least one feasible point.

  11. If there are no feasible points in the domain D, then the optimization problem is called infeasible. Naturally, if D=βˆ…, then the problem is infeasible.

  12. The set of feasible points for an optimization problem is called the feasible set or the constraint set. We shall denote the feasible set by C.

    C=domf0βˆ©β‹‚i=1mfiβˆ’1(βˆ’βˆž,0]βˆ©β‹‚j=1phjβˆ’1(0).

    It is the intersection of the domain of f0, the 0-sublevel sets of fi for i=1,…,m, and the 0-level sets of hj for j=1,…,p.

  13. Thus, if the problem is infeasible, then C=βˆ….

  14. The optimum value pβˆ—βˆˆR― of the optimization problem is defined as

    pβˆ—=inf{f0(x)|fi(x)≀0,i=1,…,m,hj(x)=0,j=1,…,p}.

    In other words,

    pβˆ—=inf{f0(x)|x∈C}.

    We allow pβˆ— to take the extended values ∞ and βˆ’βˆž.

  15. If the problem is infeasible, then pβˆ—=∞ It is consistent with the convention that the infimum of an empty set is ∞.

  16. If pβˆ—=βˆ’βˆž, then the problem is called unbounded below. In this case, there exists a sequence {xk} of C such that limf0(xk)=βˆ’βˆž.

  17. We say that xβˆ— is an optimal point if it solves (7.1). In other words, xβˆ—βˆˆC and f(xβˆ—)=pβˆ—.

  18. The set of all optimal points is known as the optimal set denoted by Xopt.

    Xoptβ‰œ{x|fi(x)≀0,i=1,…,m,hj(x)=0,j=1,…,p,f0(x)=pβˆ—}.

    In other words,

    Xopt={x∈C|f0(x)=pβˆ—}.
  19. If an optimal point exists in C, then we say that the optimal value is attained or achieved.

  20. If Xopt is empty, then we say that the optimal value is not attained or not achieved.

  21. In particular, if the problem is unbounded below, then Xopt is indeed empty.

  22. If the feasible set C is not closed, then it is quite possible that The optimum value pβˆ— is finite and yet it is not attained at any feasible point. Then, there exists a sequence {xk} of feasible points such that limf(xk)=pβˆ—. However, there is no x∈C such that f(x)=pβˆ—.

  23. A feasible point x∈C with f0(x)≀pβˆ—+Ο΅ is called an Ο΅-suboptimal point.

  24. The set of all Ο΅-suboptimal points is called the Ο΅-suboptimal set for the problem (7.1).

  25. We say that a feasible point x is locally optimal if there exists r>0 such that

    f0(x)=inf{f0(z)|z∈B[x,r]∩C}.

    In other words, x minimizes f over a local neighborhood of feasible points.

  26. If x is feasible and fi(x)=0, we say that i-th inequality constraint is active at x. Otherwise, fi(x)<0 and we say that the i-th inequality constraint is inactive.

  27. We say that a constraint is redundant if removing it does not change the feasible set.

  28. If we choose an orthonormal basis B={e1,…,en} for V, then V is isomorphic to Rn under the bracket operator [β‹…]B. The optimization variable x has a representation (x1,…,xn) in Rn given by

    x=βˆ‘i=1nxiei.
  29. Thus, determining x is same as determining its components (x1,…,xn).

  30. Since there are n scalar components in the representation of x, hence we can say that the optimization problem has n (scalar) variables.

In the sequel, we will be presenting a variety of optimization problems. We shall show how those problems can be converted to the standard form described above.

Example 7.1 (Box constraints standard form)

Let f0:Rn→R be a real valued function. Consider the optimization problem

minimize f0(x)subject to li≀xi≀ui,i=1,…,n

where x∈Rn is the optimization variable. There are n constraints, one on each component of x. Each constraint gives a lower and upper bound on one component. Such constraints are known as variable bounds or box constraints.

This problem can be transformed into an equivalent problem:

minimize f0(x)subject to liβˆ’xi≀0,i=1,…,nsubject to xiβˆ’ui≀0,i=1,…,n.

This form has 2n inequality constraints. We introduce the functions

fi(x)=liβˆ’xi,i=1,…,n and fi(x)=xiβˆ’nβˆ’uiβˆ’n,i=n+1,…,2n.

Then, the problem becomes

minimize f0(x)subject to fi(x)≀0,i=1,…,2n.

This is the optimization problem in standard form with 2n inequality constraints and 0 equality constraints.

Example 7.2 (Maximization problem in standard form)

Consider the problem

maximize f0(x)subject to fi(x)≀0,i=1,…,mhj(x)=0,j=1,…,p

If we replace f0 by βˆ’f0, then maximizing f0 is same as minimizing βˆ’f0. Thus, this problem has an equivalent form

minimize βˆ’f0(x)subject to fi(x)≀0,i=1,…,mhj(x)=0,j=1,…,p

Definition 7.4 (Feasibility problem)

Let V be an n-dimensional real vector space. A mathematical optimization problem of the form

(7.2)ΒΆfind x∈Vsubject to fi(x)≀0,i=1,…,mhj(x)=0,j=1,…,p

is called a feasibility problem.

We can convert this problem into the standard problem by introducing a cost function which is identically 0:

f0(x)=0βˆ€x∈V.
  1. The domain reduces to

    D=β‹‚i=1mdomfiβˆ©β‹‚j=1pdomhj.
  2. The optimal value pβˆ—=∞ if the problem is not feasible.

  3. Otherwise, the optimal value is pβˆ—=0.

  4. Every feasible point is also an optimal point.

  5. Every optimization problem in standard form can be converted into a feasibility problem by replacing the objective function with the function which is identically 0 everywhere.

  6. In that case, the problem reduces to checking whether the inequality and equality constraints are consistent or not. In other words, whether there are some points which satisfy all the inequality and equality constraints.

7.1.2. Equivalent ProblemsΒΆ

Definition 7.5 (Equivalent optimization problems)

We provide an informal definition of equivalent optimization problems.

Two optimization problems are called equivalent if it is possible to find the solution of one problem from the other problem and the vice versa.

The primary reason for transforming an optimization problem to an equivalent one is that it may be easier to solve an equivalent problem. If we can transform a problem into a form which has a well known closed form solution or has a well known algorithm to solve the problem, then we can solve the original problem by first solving the equivalent problem and then transforming the solution of the equivalent problem to the solution of the original problem.

  1. Transform the problem to an equivalent problem with a known algorithm for solution.

  2. Solve the equivalent problem.

  3. Transform the solution of the equivalent problem to the solution of the original problem.

In this sense, the computational complexity of solving an optimization problem cannot be greater than the complexity of solving the equivalent problem plus the complexity of transforming the solution of the equivalent problem to the solution of the original problem.

There are several ways to obtain an equivalent optimization problem from an existing optimization problem.

7.1.2.1. ScalingΒΆ

Proposition 7.1 (Equivalent problems by scaling of objective or constraint functions)

Consider the problem given by

minimize f0β€²(x)=Ξ±0f0(x)subject to fiβ€²(x)=Ξ±ifi(x)≀0,i=1,…,mhjβ€²(x)=Ξ²jhj(x)=0,j=1,…,p

where Ξ±i>0 for i=0,…,m and Ξ²jβ‰ 0 for j=1,…,p.

This problem is obtained from the optimization problem in standard from in (7.1) by scaling the objective function and the inequality constraint functions by positive constants and the equality constraints by nonzero constants.

  1. Ξ±0 cannot be negative otherwise it will convert a minimization problem to a maximization problem.

  2. Ξ±0 cannot be zero as that will convert the optimization problem into a feasibility problem.

  3. Ξ±i for i=1,…,m cannot be negative as it will turn the inequality sign from fi(x)≀0 to fβ€²(x)β‰₯0.

  4. Ξ±i for i=1,…,m cannot be zero as it will remove the corresponding constraint altogether.

  5. Ξ²j cannot be zero. A zero value will discard the corresponding equality constraint.

  6. The feasible sets for both problems are identical.

  7. The optimal sets for both problems are identical.

  8. The optimal values for both problems are not identical however (unless Ξ±0=1). Since f0β€² is a scaled version of f0, hence the optimal value is also scaled accordingly.

7.1.2.2. Change of VariablesΒΆ

Proposition 7.2 (Change of variables)

Let Ο•:Vβ†’V be an injective function with DβŠ†rangeΟ•.

For the problem (7.1), introduce the following functions

f~i(z)=fi(Ο•(z)),i=0,…,m and h~j(z)=hj(Ο•(z)),j=1,…,p.

Now, consider the problem

(7.3)ΒΆminimize f~0(z)subject to f~i(z)≀0,i=1,…,mh~j(z)=0,j=1,…,p

with the variable z∈V.

We say that the two problems are related by the change of variable or substitution of the variable x=Ο•(z).

Suppose zβˆ— is the solution to (7.3), then xβˆ—=Ο•(zβˆ—) is a solution to (7.1).

Similarly, if xβˆ— is a solution to (7.1), then zβˆ—=Ο•βˆ’1(xβˆ—) is a solution to (7.3).

7.1.2.3. Transformation with Monotone FunctionsΒΆ

Proposition 7.3 (Transformation of objective and constraint functions)

For the problem (7.1), we introduce the following:

  1. Let ψ0:Rβ†’R be a strictly increasing function over rangef0.

  2. Let ψi:Rβ†’R for i=1,…,m be strictly increasing functions over rangefi with ψi(u)≀0⟺u≀0.

  3. Let Ξ·j:Rβ†’R for j=1,…,p be real functions that satisfy Ξ·j(u)=0⟺u=0.

  4. Let f~i(x)=ψi(fi(x)) for i=0,…,m.

  5. Let h~j(x)=Ξ·j(hj(x)) for j=1,…,p.

Now, consider the problem

(7.4)ΒΆminimize f~0(x)subject to f~i(x)≀0,i=1,…,mh~j(x)=0,j=1,…,p

with the variable x∈V.

The problem (7.4) is equivalent to (7.1).

Example 7.3 (Least norms and least norms squared)

Consider the unconstrained least norm problem

minimizeβ€–Axβˆ’bβ€–2

with x∈Rn, A∈RmΓ—n and b∈Rm. We have f0(x)=β€–Axβˆ’bβ€–2. Then, rangef0=R+. Consider ψ0:Rβ†’R as

ψ0(x)=x2.

Then, ψ0 is strictly increasing over R+. Let

f~0(x)=ψ0(f0(x))=β€–Axβˆ’bβ€–22=(Axβˆ’b)T(Axβˆ’b).

Then, the least norm minimization problem is equivalent to the problem

minimize(Axβˆ’b)T(Axβˆ’b)

which is the least norm squared problem.

We note that while f~0 is differentiable, f0 is not. It is a key difference between the two problems. Solving the least norm squared problem can be done using gradient methods since the objective function is differentiable.

7.1.2.4. Slack VariablesΒΆ

Proposition 7.4 (Slack variables)

For the inequality constraints in the problem (7.1), we can introduce the variables siβ‰₯0 so that fi(x)+si=0. This way, we can convert an inequality constraint to an equality constraint and introduce simpler inequality constraints for the variables siβ‰₯0. The resultant problem:

(7.5)ΒΆminimize f0(x)subject to siβ‰₯0,i=1,…,mfi(x)+si=0,i=1,…,mhj(x)=0,j=1,…,p

where the optimization variables are x∈V and s∈Rm is equivalent to (7.1).

The variables si are known as slack variables. s=(s1,…,sm) is a vector that collects all the slack variables.

  1. This form has m inequality constraints and m+p equality constraints.

  2. It has n+m optimization variables x1,…,xn and s1,…,sm.

  3. If (x,s) is a feasible point for (7.5), then x is a feasible point for (7.1).

  4. If x is a feasible point for (7.1), then we can pick si=βˆ’fi(x) to form s making (x,s) a feasible point for (7.5).

  5. x is an optimal point for (7.1) if and only if (x,s) is an optimal point for (7.5) where si=βˆ’fi(x).

  6. The slack variables measure how much slack does an inequality constraint function have at a feasible point.

  7. If si=0 then, fi is active and it has no slack.

  8. If si>0, then fi inactive and it has some slack.

7.1.2.5. Epigraph Polar FormΒΆ

Proposition 7.5 (Epigraph polar form)

The problem (7.1) is equivalent to the optimization problem below:

(7.6)ΒΆminimize tsubject to f0(x)βˆ’t≀0,fi(x)≀0,i=1,…,mhj(x)=0,j=1,…,p

with the optimization variable (x,t)∈VβŠ•R.

  1. Note that t appears only in a single inequality in (7.6).

  2. At the optimal point, f0(x)=t must hold true. If f(x)<t, then we can always choose a lower value of t as the solution.

  3. If x is an optimal point for (7.1), then (x,f(x)) must be an optimal point for (7.6).

  4. Similarly, if (x,t) is an optimal point for (7.6), then x must be an optimal point for (7.1) and t=f0(x) must hold true.

  5. The two problems are indeed equivalent.

  6. This problem has m+1 inequality constraints and p equality constraints.

  7. The objective function for the epigraph form is a linear function of (x,t).

7.1.3. First Order ConditionsΒΆ

In this subsection, we focus on objective functions of type f:Rn→R which are differentiable.

Theorem 7.1 (First order optimality criterion for local optimal points)

Let f:Rnβ†’R be a real valued function with S=domf. Suppose that xβˆ—βˆˆintS is a local optimal point. Assume that all the partial derivatives of f exist at xβˆ—.
Then, βˆ‡f(xβˆ—)=0.

  1. Let i=1,…,n.

  2. Consider the one-dimensional function gi:R→R given by

    gi(t)=f(xβˆ—+tei)

    where ei is the i-th unit vector of Rn.

  3. We note that gi is differentiable at t=0 and

    giβ€²(0)=βˆ‚fβˆ‚xi(xβˆ—).
  4. Since xβˆ— is a local optimal point of f, hence t=0 is a local optimal point of gi.

  5. Thus, giβ€²(0)=0.

  6. Thus, βˆ‚fβˆ‚xi(xβˆ—)=0.

  7. Since, this is true for every i=1,…,n, hence βˆ‡f(xβˆ—)=0.

We mention that gradient being zero is a necessary condition; i.e., if a point is an optimal point then the gradient of f at that point must be zero. It is not a sufficient condition. The gradient may be zero and yet the point may not be an optimal point.

Definition 7.6 (Stationary point)

Let f:Rnβ†’R be a real valued function with S=domf. Let xβˆ—βˆˆintS and assume that f is differentiable in some neighborhood of xβˆ—. Then, xβˆ— is called a stationary point if βˆ‡f(xβˆ—)=0.

Thus, the locally optimal points are necessarily stationary points.

7.1.4. Second Order ConditionsΒΆ

Recall the discussion on semidefinite and definite matrices in Eigen Values.

For a twice continuously differentiable function f, the Hessian is symmetric. The semidefiniteness or definiteness of the Hessian provides necessary and sufficient conditions for local optimality at stationary points of f; i.e., the points a∈domf where βˆ‡f(a)=0.

We have following necessary conditions for local optimality.

  1. If a is a local minimum point then βˆ‡2f(a) must be positive semidefinite.

  2. If a is a local maximum point then βˆ‡2f(a) must be negative semidefinite.

We have following sufficient conditions for local optimality.

  1. If βˆ‡2f(a) is positive definite, then a is a strict local minimum.

  2. If βˆ‡2f(a) is negative definite, then a is a strict local maximum.

Theorem 7.2 (Necessary second order optimality conditions)

Let f:Rnβ†’R be a real valued function with S=domf. Assume that S is open. Further, assume that f is twice continuously differentiable over S and that a∈S is a stationary point.

Then, the following hold.

  1. If a is a local minimum point of f over S, then βˆ‡2f(a)βͺ°O.

  2. If a is a local maximum point of f over S, then βˆ‡2f(a)βͺ―O.

Proof. Assume a to be a local minimum point.

  1. Then, there exists an open ball B(a,r)βŠ†S such that f(x)β‰₯f(a) for all x∈B(a,r).

  2. Let d∈Rn be a nonzero vector.

  3. For 0<t<rβ€–dβ€–, we define

    at=a+td.

    By definition, at∈B(a,r).

  4. Hence, for any t∈(0,rβ€–dβ€–), f(at)β‰₯f(a).

  5. By linear approximation theorem (Theorem 5.5), there exists a vector zt∈[a,at] such that

    f(at)βˆ’f(a)=βˆ‡f(a)T(atβˆ’a)+12(atβˆ’a)Tβˆ‡2f(zt)(atβˆ’a).
  6. Since a is a stationary point, hence βˆ‡f(a)=0.

  7. Also, by definition of at, atβˆ’a=td.

  8. Thus,

    f(at)βˆ’f(a)=t22dTβˆ‡2f(zt)d.
  9. Since f(a) is local minimum. Hence, f(at)βˆ’f(a)β‰₯0.

  10. Thus, for any d∈Rn and any 0<t<rβ€–dβ€–, we have

    t22dTβˆ‡2f(zt)dβ‰₯0.
  11. By the continuity of the Hessian, and the fact that zt→a as t→0+, we obtain that

    t22dTβˆ‡2f(a)dβ‰₯0βˆ€d∈Rn.
  12. Thus, βˆ‡2f(a)βͺ°O.

The argument for second statement is similar. We can apply the same argument on βˆ’f and recognize that if a is a local maximum for f then it is a local minimum for βˆ’f.

Theorem 7.3 (Sufficient second order optimality conditions)

Let f:Rnβ†’R be a real valued function with S=domf. Assume that S is open. Further, assume that f is twice continuously differentiable over S and that a∈S is a stationary point.

Then, the following hold.

  1. If βˆ‡2f(a)succO, then a is a strict local minimum point of f over S.

  2. If βˆ‡2f(a)β‰ΊO, then a is a strict local maximum point of f over S.

Proof. Let a∈S is a stationary point satisfying βˆ‡2f(a)succO.

  1. Since the Hessian is continuous (f is twice continuously differentiable), hence there exists an open ball B(a,r) such that βˆ‡2f(x)succO for every x∈B(a,r).

  2. By linear approximation theorem (Theorem 5.5), for any x∈B(a,r), there exists a vector z∈[a,x] such that

    f(x)βˆ’f(a)=12(xβˆ’a)Tβˆ‡2f(z)(xβˆ’a).
  3. We note that z∈B(a,r). Hence, βˆ‡2f(z)succO.

  4. Thus, 12(xβˆ’a)Tβˆ‡2f(z)(xβˆ’a)>0.

  5. Thus, f(x)βˆ’f(a)>0 holds true for every x∈B(a,r).

  6. Thus, a is a strict local minimum point for f over S.

The proof for the second statement is similar.

Theorem 7.4 (Hessian based sufficient global optimality condition)

Let f:Rnβ†’R be a real valued function with domf=Rn. Further, assume that f is twice continuously differentiable over Rn. Suppose that βˆ‡2f(x)βͺ°O for every x∈Rn. Let a∈Rn be a stationary point of f. Then, a is a global minimum point of f.

In other words, if the Hessian of a function is continuous and always positive semidefinite, then all its stationary points are also global minimum points.

Proof. We are given that a is a stationary point.

  1. βˆ‡f(a)=0.

  2. Let x∈Rn.

  3. By linear approximation theorem (Theorem 5.5), there exists a vector z∈[a,x] such that

    f(x)βˆ’f(a)=12(xβˆ’a)Tβˆ‡2f(z)(xβˆ’a).
  4. By hypothesis, βˆ‡2f(z)βͺ°O.

  5. Thus, f(x)βˆ’f(a)β‰₯0 for every x∈Rn.

  6. Thus, a is a global minimum point of f.

This result is a special case for the global optimality of convex optimization problems. If a convex function is twice differentiable, then its Hessian is positive semidefinite Theorem 8.100.

7.1.4.1. Saddle PointsΒΆ

Definition 7.7 (Saddle point)

Let f:Rn→R be a real valued function with S=domf. Assume that S is open.

A stationary point a∈S is called a saddle point of f over U if it is neither a local minimum point nor a local maximum point of f over S.

Theorem 7.5 (Sufficient condition for saddle points)

Let f:Rnβ†’R be a real valued function with S=domf. Assume that S is open. Further, assume that f is twice continuously differentiable over S and that a∈S is a stationary point.

If βˆ‡2f(a) is an indefinite matrix, then a is a saddle point of f over S.

Proof. We are given that a is a stationary point and βˆ‡2f(a) is indefinite.

  1. Then, βˆ‡f(a)=0.

  2. Recall from Theorem 4.135 that a matrix is indefinite if and only if at least one eigenvalue is positive and at least one eigenvalue is negative.

  3. Let Ξ»>0 be a positive eigenvalue of βˆ‡2f(a) and v be the corresponding normalized eigenvector.

  4. Since S is open, hence there exists r>0 such that B(a,r)βŠ†S.

  5. Accordingly, a+tv∈B(a,r)βŠ†S for every t∈[0,r).

  6. By quadratic approximation theorem (Theorem 5.6),

    f(a+tv)=f(a)+t22vTβˆ‡2f(a)v+o(t2β€–vβ€–2).
  7. Putting βˆ‡2f(a)v=Ξ»v and using β€–vβ€–=1,

    f(a+tv)=f(a)+Ξ»t22+o(t2).
  8. Here o:R++→R is a function satisfying o(x)x→0 as x→0+.

  9. Thus, o(t2)t2→0 as t→0+.

  10. For every Ο΅>0, there exists Ξ΄>0 such that

    |o(t2)t2|<Ο΅, whenever t<Ξ΄.
  11. Choose Ο΅=Ξ».

  12. Then, βˆ’Ξ»t22<o(t2)<Ξ»t22 for every t<Ξ΄.

  13. Thus, Ξ»t22+o(t2)>0 for every t<Ξ΄.

  14. Thus, there exists r1=min(Ξ΄,r) such that for every 0≀t<r1, f(a+tv)>f(a).

  15. Thus, a is not a local maximum point.

  16. A similar argument with a negative eigenvalue and corresponding normalized eigenvector shows that a cannot be a local minimum point.

  17. Thus, a must be a saddle point.

7.1.5. Minimization of Proper FunctionsΒΆ

Let V be a real n-dimensional normed linear space. Let f:Vβ†’(βˆ’βˆž,∞] be a proper function with S=domf. We consider the problem of minimizing f over a set AβŠ†S.

  1. f is the objective function or cost function (being minimized).

  2. The set A is the feasible set or constraint set.

  3. Any x∈A is a feasible solution or feasible point.

  4. If there is at least one feasible point (i.e., A is nonempty), the problem is feasible.

  5. Otherwise, the problem is infeasible.

  6. Let pβˆ—=infx∈Af(x) be the optimal value of the minimization problem.

  7. We allow pβˆ— to take values over the extended real line R―.

  8. If the minimization problem is infeasible, then pβˆ—=∞.

  9. If pβˆ—=βˆ’βˆž, then the problem is unbounded below.

  10. If there is some xβˆ—βˆˆA such that f(xβˆ—)=pβˆ—, then xβˆ— is an optimal solution or optimal point or minimizing point or minimizer or global minimum over A.

  11. Alternatively, f attains a minimum over A at xβˆ—. We write this as

    xβˆ—βˆˆargminx∈Af(x).
  12. If xβˆ— is a unique minimizer of f over A, then we abuse the notation and write

    xβˆ—=argminx∈Af(x).
  13. It is possible that pβˆ— is finite and yet there is no optimal point in A.

  14. In other words, the set argminx∈Af(x) may be empty.

A basic question of optimization is whether an optimal solution exists.

Remark 7.2 (The set of optimal points)

The set of optimal points for the problem of minimization of a proper function f over a feasible set A is given by

argminx∈Af(x)=A∩fβˆ’1(pβˆ—)

where pβˆ—=infx∈Af(x) and fβˆ’1(y) denotes the level set of f given by {x∈S|f(x)=y}.

This comes directly from the fact that

  1. xβˆ— must be feasible. Hence xβˆ—βˆˆA.

  2. f must attain the optimal value at xβˆ—. Thus, pβˆ—=f(xβˆ—). Hence, xβˆ—βˆˆfβˆ’1(pβˆ—).

In other words, the optimal set is the intersection of the feasible set A with the level set fβˆ’1(pβˆ—).

Thus, for an optimal solution to exist

  1. pβˆ— must be finite.

  2. The level set fβˆ’1(pβˆ—) must be nonempty.

  3. The feasible set A must be nonempty.

  4. The intersection of A with fβˆ’1(pβˆ—) must be nonempty.

7.1.5.1. Coercive FunctionsΒΆ

Definition 7.8 (Coercive function)

A proper function f:Vβ†’(βˆ’βˆž,∞] is called coercive over a set A if for every sequence {xn} of A such that limkβ†’βˆžβ€–xkβ€–=∞, we have limkβ†’βˆžf(xk)=∞.

If f is coercive over the entire vector space V, we say that f is coercive.

One way to think about coercive functions is that they grow rapidly at the extremes of the domain on which they are defined.

Theorem 7.6 (Level sets of coercive functions)

Let f:Vβ†’(βˆ’βˆž,∞] be a coercive proper function. Let a∈R. If the level set fβˆ’1(a) is nonempty, then it is bounded.

In other words, all nonempty level sets of a coercive function are bounded.

Proof. For contradiction, assume that there exists a∈R such that A=fβˆ’1(a) is unbounded.

  1. Then, there exists a sequence {xk} of A such that limkβ†’βˆžβ€–xkβ€–=∞.

  2. Since f is coercive, hence limkβ†’βˆžf(xk)=∞.

  3. But, by definition, f(xk)=a.

  4. Hence limkβ†’βˆžf(xk)=a.

  5. We have a contradiction.

  6. Hence, fβˆ’1(a) must be bounded.

Theorem 7.7 (Sublevel sets of coercive functions)

Let f:Vβ†’(βˆ’βˆž,∞] be a coercive proper function with S=domf. For every r∈R, define the sublevel set Sr as {x∈S|f(x)≀r}. If Sr is nonempty, then it is bounded.

In other words, all nonempty sublevel sets of a coercive function are bounded.

Proof. For contradiction, assume that there exists r∈R such that Sr is unbounded.

  1. Then, there exists a sequence {xk} of Sr such that limkβ†’βˆžβ€–xkβ€–=∞.

  2. Since f is coercive, hence limkβ†’βˆžf(xk)=∞.

  3. But, by definition, f(xk)≀r.

  4. Hence limkβ†’βˆžf(xk)≀r.

  5. We have a contradiction.

  6. Hence, Sr must be bounded.

7.1.5.2. Weierstrass’ TheoremΒΆ

In this subsection, we examine the problem of unconstrained minimization of a proper closed function. Recall from Definition 3.64 that a function is called closed if all its sublevel sets are closed.

Theorem 7.8 (Unconstrained minimization of a proper closed function)

Let V be a real n-dim normed linear space. Let f:Vβ†’(βˆ’βˆž,∞] be a proper closed function with S=domf.

Let pβˆ—=infx∈Vf(x). Let X=fβˆ’1(pβˆ—) be the set of minimizers of f (over all of V). For every p∈R, let Sp denote the sublevel set {x|f(x)≀p}. Then,

  1. pβˆ—<∞.

  2. X=β‹‚p>pβˆ—Sp.

  3. X is closed.

In words, the minimal value pβˆ— is either finite or βˆ’βˆž. The set of minimizers is the intersection of all the (closed) sublevel sets for p>pβˆ—. The set of minimizers is a closed set.

Proof. We show that pβˆ—<∞.

  1. Since f is proper, hence S is nonempty.

  2. Thus, there exists x∈S such that f(x)<∞.

  3. Hence, pβˆ—=infx∈Vf(x)<∞.

Let Y=β‹‚p>pβˆ—Sp. We have to show that X=Y.

We first show that XβŠ†Y.

  1. let x∈X.

  2. Then, f(x)=pβˆ—.

  3. Then, f(x)<p for every p>pβˆ—.

  4. Hence, x∈Sp for every p>pβˆ—.

  5. Thus, x∈Y.

  6. Thus, XβŠ†Y.

We now show that YβŠ†X.

  1. Let x∈Y.

  2. We claim that f(x)=pβˆ— must hold true.

  3. f(x) cannot be smaller than pβˆ— since by definition pβˆ—=inff(x).

  4. Thus, f(x)β‰₯pβˆ— must hold true.

  5. Now, for contradiction, assume that f(x)>pβˆ—.

    1. Let q=f(x).

    2. Let p=pβˆ—+q2.

    3. Then, pβˆ—<p<q.

    4. Hence, xβˆ‰Sp.

    5. But then, xβˆ‰Y since YβŠ†Sp as p>pβˆ—.

    6. A contradiction.

  6. Thus, the only allowed value for f(x) is pβˆ—.

  7. Thus, x∈X.

  8. Thus, YβŠ†X.

  9. Thus,

    X=fβˆ’1(pβˆ—)=Y=β‹‚p>pβˆ—Sp.

We show that X is closed.

  1. If X is empty, then it is closed by definition.

  2. Otherwise, X is an intersection of sublevel sets.

  3. Since f is closed, its sublevel sets are closed.

  4. Hence Sp is closed for every p.

  5. Thus, X is an intersection of closed sets.

  6. Thus, X is closed.

Theorem 7.9 (Weierstrass’ theorem)

Let V be a real n-dim normed linear space. Let f:Vβ†’(βˆ’βˆž,∞] be a proper closed function with S=domf.

Let pβˆ—=infx∈Vf(x). Let X=fβˆ’1(pβˆ—) be the set of minimizers of f (over all of V). For every p∈R, let Sp denote the sublevel set {x|f(x)≀p}.

Assume that one of the following conditions are true.

  1. S=domf is closed and bounded.

  2. There exists a scalar r∈R such that the sublevel set Sr={x|f(x)≀r} is nonempty and bounded.

  3. f is coercive.

Then, X (the set of minimizers of f) is nonempty and compact.

Here is this result in a simplified language. The theorem addresses the problem of unconstrained minimization of a function.

  1. If f is proper and closed and its domain is closed and bounded, then its set of minimizers is nonempty and compact.

  2. If f is proper and closed and any sublevel set of f is bounded, then its set of minimizers is nonempty and compact.

  3. If f is proper, closed and coercive, then its set of minimizers is nonempty and compact.

Since the set of minimizers is nonempty and compact, it is nonempty, closed and bounded.

Proof. If there exists x∈S such that f(x)=pβˆ—, then X is nonempty. To show that X is compact, we just need to show that it is closed and bounded. By Theorem 4.68, every closed and bounded subset of V which is a real n-dimensional normed linear space is compact. By Theorem 7.8, the optimal value pβˆ—<∞ and X, the set of minimizers of a proper and closed function, is closed. Thus, we just need to show that X is nonempty and bounded.

Assume that condition (1) holds.

  1. Since f is proper, hence S is nonempty.

  2. Consider a sequence {xk} of S such that limkβ†’βˆžf(xk)=pβˆ—.

  3. Since S is bounded, hence {xk} has a convergent subsequence by Bolzano-Weierstrass theorem (Theorem 4.71).

  4. Let {xl} be the convergent subsequence of {xk} with limxl=xβˆ—.

    1. Such a sequence exists since pβˆ—=infx∈Vf(x).

    2. Thus, for every k∈N, there exists xk∈S such that f(xk)βˆ’pβˆ—<1k.

  5. Since S is closed, hence xβˆ—βˆˆS.

  6. Since {f(xk)} is convergent and {f(xl)} is a subsequence of {f(xk)}, hence

    limlβ†’βˆžf(xl)=limkβ†’βˆžf(xk)=pβˆ—.
  7. Since f is closed, hence it is lower semicontinuous (l.s.c.) at xβˆ—βˆˆS (see Theorem 3.119).

  8. Also, note that since {f(xk)} is convergent, hence

    lim inflβ†’βˆžf(xl)=limlβ†’βˆžf(xl).
  9. Since f is l.s.c. at xβˆ—βˆˆS, hence by Theorem 3.108,

    f(xβˆ—)≀lim inflβ†’βˆžf(xl)=limlβ†’βˆžf(xl)=lim infkβ†’βˆžf(xk)=pβˆ—.
  10. Since pβˆ— is the infimum value of f, hence f(xβˆ—) cannot be smaller than pβˆ—.

  11. Thus, f(xβˆ—)=pβˆ— must be true.

  12. Since f is proper and xβˆ—βˆˆS, hence pβˆ—>βˆ’βˆž. Thus the optimal value is finite.

  13. Thus, f(xβˆ—)=pβˆ— means that xβˆ— is an optimal solution for the minimization of f.

  14. Thus, the set X=fβˆ’1(pβˆ—) is nonempty.

  15. X is bounded since S is bounded by hypothesis.

  16. Since X is closed and bounded. Hence, X is compact.

Assume that condition (2) holds.

  1. The sublevel set Sr is nonempty and bounded.

  2. Since f is closed, hence Sr is also closed.

  3. Consider the restriction of f on Sr given by

    f~={f(x),f(x)≀r∞, otherwise .
  4. Then, domf~=Sr. Thus, domf~ is nonempty, closed and bounded.

  5. Then, f~ never takes the value βˆ’βˆž and is not identically ∞. Hence, f~ is a proper function.

  6. Since f is closed, hence sublevel sets of f~ are also closed.

    1. For any p>r, the sublevel set of f~ is identical to Sr.

    2. For any p≀r, the sublevel set of f~ is identical to Sp, the corresponding sublevel set of f.

  7. Thus, f~ is closed too.

  8. Applying condition (1) on f~, the set of minimizers of f~ is nonempty and compact.

  9. We also note that the minimizers of f are identical to the minimizers of f~.

  10. Thus, the set of minimizers of f is nonempty and compact.

Assume that condition (3) holds.

  1. Since f is proper, hence it has some nonempty sublevel sets.

  2. Let r∈R be one such scalar such that the sublevel set Sr={x∈S|f(x)≀r} is nonempty.

  3. By Theorem 7.7, the nonempty sublevel sets of f are bounded. Hence, Sr is bounded.

  4. Then, by applying condition (2), the set of minimizers of f is nonempty and compact.

Corollary 7.1 (Minimizing a proper closed function over a closed set)

Let f:Vβ†’(βˆ’βˆž,∞] be a proper closed function with S=domf. Let AβŠ†S be a nonempty closed set. Consider the problem of minimizing f over A.

Further, assume that one of the following conditions are true.

  1. A is bounded.

  2. There exists a scalar r∈R such that the set {x∈A|f(x)≀r} is nonempty and bounded.

  3. f is coercive over A.

Then, the set of minimizers of f over A is nonempty and compact.

Proof. We define a restriction g:Vβ†’(βˆ’βˆž,∞] of f over the set A as follows

g(x)={f(x),x∈A∞, otherwise .
  1. domg=A. Thus, domg is closed.

  2. Since f never takes the value βˆ’βˆž, hence g also never takes the value βˆ’βˆž.

  3. Since A is nonempty, hence there exists x∈V such that g(x)<∞.

  4. Hence, g is a proper function.

  5. Also, note that the set {x∈A|f(x)≀r} is nothing but the sublevel set of g for the scalar r.

    {x|g(x)≀r}=A∩{x|f(x)≀r}.
  6. Since f is closed, all its sublevel sets are closed.

  7. A is closed, hence A∩{x|f(x)≀r} is also closed.

  8. Hence, all the sublevel sets of g are also closed. Hence, g is also closed.

  9. The set of minimizers of f over A is nothing but the set of minimizers of g.

  10. If f is coercive over A, then g is coercive (over its entire domain).

Thus, applying Theorem 7.9, the set of minimizers of g is nonempty and compact. So is the set of minimizers of f over A.

Corollary 7.2 (Minimizing a real valued l.s.c. function over a closed set)

Let f:Vβ†’R be a real valued function with domf=V. Let AβŠ†V be a nonempty closed set. Assume that f is lower semicontinuous over A. Consider the problem of minimizing f over A.

Further, assume that one of the following conditions are true.

  1. A is bounded.

  2. There exists a scalar r∈R such that the set {x∈A|f(x)≀r} is nonempty and bounded.

  3. f is coercive over A.

Then, the set of minimizers of f over A is nonempty and compact.

Proof. We define a restriction g:Vβ†’(βˆ’βˆž,∞] of f over the set A as follows

g(x)={f(x),x∈A∞, otherwise .
  1. domg=A. Thus, domg is closed.

  2. Since f is real valued hence g also never takes the value βˆ’βˆž.

  3. Since A is nonempty, hence there exists x∈V such that g(x)<∞.

  4. Hence, g is a proper function.

  5. Since f is l.s.c. on A, hence g is closed.

  6. Also, note that the set {x∈A|f(x)≀r} is nothing but the sublevel set of g for the scalar r.

  7. The set of minimizers of f over A is nothing but the set of minimizers of g.

  8. If f is coercive over A, then g is coercive (over its entire domain).

Thus, applying Theorem 7.9, the set of minimizers of g is nonempty and compact. So is the set of minimizers of f over A.

7.1.6. Some Simple Problems and Technical ResultsΒΆ

In this subsection, we provide some technical results on some simple optimization problems. These results are useful building blocks for bigger problems.

Theorem 7.10 (Nonnegativity of affine functional on nonnegative orthant)

Consider the affine function f:Rn→R given by

f(x)=aTx+b.

Then, f(x)β‰₯0 for every xβͺ°0 if and only if aβͺ°0 and bβ‰₯0.

Proof. Assume that aβͺ°0 and bβ‰₯0. Then f(x) is product and sum of nonnegative terms. Hence, f(x)β‰₯0.

For the converse, assume that f(x)β‰₯0 for every xβͺ°0.

  1. If b<0, then for x=0, f(x)=b<0. Hence, b cannot be smaller than 0.

  2. Consider some a≺0.

  3. Assume that ai<0 for some i∈[1,…,n].

  4. Let x be such that xj=0 for all j≠i and xi=t for some t>0.

  5. Then, f(x)=tai+b.

  6. By increasing t suitably, we can get f(x)<0.

  7. Thus, aβͺ°0 is also necessary to ensure that f(x)β‰₯0 for all xβͺ°0.

Theorem 7.11 (Minimization of linear functional over unit ball)

For any a∈Rn, the optimal value of the problem

infβ€–x‖≀1aTx

is βˆ’β€–aβ€–.

Proof. If a=0, then aTx=0. The minimum value is 0 which is equal to βˆ’β€–aβ€–.

Now consider the case where a≠0.

  1. By Cauchy Schwartz inequality

    aTxβ‰₯βˆ’β€–aβ€–β€–xβ€–β‰₯βˆ’β€–aβ€–

    for any β€–x‖≀1.

  2. Thus, we have established a lower bound:

    infβ€–x‖≀1aTxβ‰₯βˆ’β€–aβ€–.
  3. We now show that the lower bound is indeed attained.

  4. Let x=βˆ’aβ€–aβ€–.

  5. Then β€–xβ€–=1. Thus, x belongs to the unit ball.

  6. Now aTx=βˆ’β€–aβ€–. Thus, the lower bound is attained.