Univariate optimization

From Teachwiki
Jump to: navigation, search



In mathematics, the term optimization, or mathematical programming, refers to the study of problems in which one seeks to minimize or maximize a real function by systematically choosing the values of real or integer variables from within an allowed set. Optimization problem consist of basic pieces:

  1. Objective function – mean a object which we want to minimize or maximize, e.g. we want to maximize profit or minimize necessary time for production
  2. A set of variables which affect the value of the objective function. In the manufacturing problem, the variables might include the amounts of different resources used or the time spent on each activity
  3. A set of constraints, which define the solution space of the problem. For example, we could not product a negative number of products. Constraints can help us to restrict the solution space and thus make the problem easier, but non-trivial constraints can make problem much harder. In border areas the function can have extreme, but does not have to have derivation equal to zero.

Mathematical description is

minimize f(x),\,x\in D_f

subject to  c_i (x) \leq b_i,\, i= 1 \dots m

  • x=(x_1,\dots ,x_n): optimization variable
  • f:\mathbf{R} ^n -> \mathbf{R}: objective function
  • c_i:\mathbf{R} ^n -> \mathbf{R},\, i= 1 \dots m: constraint functions

optimal solution \tilde{x} has smallest value of f among all vectors that satisfy the constraints


Classification of optimization problems according search space x\in\mathbb{X}

  • Discrete \mathbb{X}, e.g. traveling salesmen problem
  • Continuous \mathbb{X}
    • Linear programming: f is linear and \mathbb{X} is a polyhedron specified by linear equalities and inequalities
    • Quadratic programming: f is convex quadratic and \mathbb{X} is a polyhedron specified by linear equalities and inequalities
    • Convex programming: f is convex function and \mathbb{X} is a convex set
    • Nonlinear programming: f is nonlinear and \mathbb{X} is specified by nonlinear equalities and inequalities

There exist also another type of optimization problems, mainly from graph theory area, but they are out of topic of this thesis. The second important possibility how to classify optimization problem is according type of solution. We can be interested only in finding some extreme (local optimization), or we want to be sure that we found minimum or maximum (global optimization).


Linear programming[edit]


As an example of optimization problem we could use plane cargo problem. The simples possibility is solve problem as a linear programming. In this case our cargo plane has three compartments for storing cargo:

Compartment Weight capacity (tonnes) Space capacity (cubic meters)
Front 10 6800
Centre 16 8700
Rear 8 5300

The following four cargoes are available for shipment on the next flight:

Cargo Weight (tonnes) Volume (cubic metres/tonne) Profit (£/tonne)
C1 18 480 310
C2 15 650 380
C3 23 580 350
C4 12 390 285


  • each cargo can be split into whatever proportions/fractions we desire
  • each cargo can be split between two or more compartments if we so desire
  • the cargo can be packed into each compartment (for example if the cargo was spherical it would not be possible to pack a compartment to volume capacity, some free space is inevitable in sphere packing)
  • all the data/numbers given are accurate

The objective is to determine how much (if any) of each cargo C1, C2, C3 and C4 should be accepted and how to distribute each among the compartments so that the total profit for the flight is maximized. The notation of this problem is following. We denote variables as:

  • x_ij be the number of tonnes of cargo i

(i=1,2,3,4 for C1, C2, C3 and C4 respectively)

  • that is put into compartment j

(j=1 for Front, j=2 for Centre and j=3 for Rear)

  • where xij >=0: i=1,2,3,4; j=1,2,3


1. We cannot pack more of each of the four cargoes than we have available

x_{11} + x_{12} + x_{13} <= 18

x_{21} + x_{22} + x_{23} <= 15 

x_{31} + x_{32} + x_{33} <= 23

x_{41} + x_{42} + x_{43} <= 12 

2. The weight capacity of each compartment must be respected

x_{11} + x_{21} + x_{31} + x_{41} <= 10 

x_{12} + x_{22} + x_{32} + x_{42} <= 16 

x_{13} + x_{23} + x_{33} + x_{43} <= 8 

3. The volume (space) capacity of each compartment must be respected

480x_{11} + 650x_{21} + 580x_{31} + 390x_{41} <= 6800 

480x_{12} + 650x_{22} + 580x_{32} + 390x_{42} <= 8700 

480x_{13} + 650x_{23} + 580x_{33} + 390x_{43} <= 5300 

Objective function

  • The objective is to maximize total profit:

max 310(x_{11}+ x_{12}+x_{13}) + 380(x_{21}+ x_{22}+x_{23}) + 
     + 350(x_{31}+ x_{32}+x_{33}) + 285(x_{41}+ x_{42}+x_{43}) 

Quadratic programming[edit]

If we add a amortization of plane, our objective function change to:

max\, 310\sum_{i=1}^3 x_{1i} + 380\sum_{i=1}^3 x_{2i} + 350\sum_{i=1}^3 x_{3i} +  285\sum_{i=1}^3 x_{4i} + 25(\sum_{i,j=1}^{i=3,j=4} x_{ij})^2

Convex programming[edit]


A plane has to keep balance - we need add new convex constraints:

\alpha_1\sum_{i=1}^3 x_{1i} + \alpha_2\sum_{i=1}^3 x_{2i} = \alpha_3\sum_{i=1}^3 x_{3i} +  \alpha_4 e^{\sum_{i=1}^3 x_{4i}}

Nonlinear programming[edit]

Adding a fuel cost change our problem to most general case:

max\, 310\sum_{i=1}^3 x_{1i} + 380\sum_{i=1}^3 x_{2i} + 350\sum_{i=1}^3 x_{3i} +  285\sum_{i=1}^3 x_{4i} + 25(\sum_{i,j=1}^{i=3,j=4} x_{ij})^2 - log(\sum_{i,j=1}^{3} x_{ij})


Where do optimization problems arise?

  • Economics: Consumer theory, supplier theory
  • Finance: Optimal hedging, pricing
  • Statistics: Data fitting, regression, pattern recognition
  • Science, Engineering: Aerospace, product design, data mining
  • Other Business decisions: Scheduling, production, organizational decisions
  • Government: Military applications, fund allocation, etc.
  • Other Personal decisions: Sports, on-field decisions, player acquisition, marketing

Univariate algorithms[edit]

We demand on our algorithm three basic requests. It should be fast, accurate and need small memory. Unfortunately these requirements are in opposite go often counter. Thus we have to make some compromise. Computation of f(x) and its derivations is the most time-consuming part. Therefore we try to minimize the number of them. According it, we can divide algorithms into these groups:

  • Without using derivate
  • With using first derivate
  • With using a matrix of second derivates

In following chapter are presented three basics algorithms, which are unique for univariate case. They are intended for constrained problems.

Golden section search method[edit]

Belong to the algorithms, which do not use derivates. It is analogy of finding a root of function by bisection method.

(a,b): \,f(a)f(b)<0 \, => (a,x) or (x,b)

The root is supposed to have been bracketed in interval (a,b). One then evaluates the function in the intermediate point x and obtains a new, smaller bracketing interval, either (a,x) or (x,b). After a finite number of steps we get acceptably small interval which surely contain a root. The same process we need to do with minimum, but for bracketing minimum we need three points which satisfied this condition:

a<b<c: f(b)<f(a)\, \bigwedge\, f(b)<f(c) => minimum  \in (a,c)

Now we choose a new point, suppose that is in interval (b,c). The second possibility - belong to (a,b) - is similar. Now can arrive two cases, in each we are able to choose new smaller bracketing interval, which surely contain the minimum

f(b)<f(x) => (a,b,x)\,\!

f(b)>f(x) => (b,x,c)\,\!

The individual steps are illustrated on pictures. They was generated by java script from this page, you can run there whole process also with another functions.

Initial bracket
New bracket


Again, after finite number of steps we are able to find acceptably small interval with minimum. The remaining question is how to choose the new point x. Suppose that b is the fraction w of the way between a and c

 \frac{b-a}{c-a}=w \,\,\,\,\,\,\,\,\,\,\,  \frac{c-b}{c-a}=1-w

Our next point x is fraction z beyond b


Then length of new bracketing interval is

Notre Dame de Paris

w+z\,\,\,\,\! or 1-w\,\,\,\,\!

We want to make them equal, to keep the cutting speed of interval constant.

\frac{z}{1-w}=w \,\,\,\,\! with  \,\,\,\,\! w+z=1-w

This formula gives us

w^3 -3w+1=0 \,\,\,\, => \,\,\,\,\, w=\frac{3-\sqrt{5}}{2} = 0.38197

The golden section search guarantees that after each new function evaluation will be the bracketing minimum interval just 0.61803 times the size of the preceding interval (if we start with triplet it golden rate, else it fast converge to it). The interesing perception is that golden rate have more applications and it is known already form antiquity, e.g. as a esthetic criterion. You can look on http://en.wikipedia.org/wiki/Golden_ratio

Parabolic interpolation[edit]

In previous method we assume that the function can have strange behaving. If we assume that the function do not change its shape wild and look approximately like parabola, we can use for estimating new point this rule:


The better understanding how algorithm works provide picture. At the beginnig parabola go through points a,b,c. We found its minimum at point d and set a new parabola b,d,c. Now we found new minimum at point e and continuing in this process until we are not acceptably near to the minimum (the position of new point is not vary a lot). It was generated by java script from similar page as golden section method.

Previous parabola
New parabola

Brent‘s method[edit]

More robust is an algorithm, which is able to recognize behavior of function. According it the algorithm switches between golden ratio and parabolic approach. One possibility is Brent's method. It keep in mind six values: a,b,u,v,w and x

  • a,b are the borders points of interval with minimum
  • x is the point with the very least function value found so far
  • w is the point with the second least function value
  • v is the previous value of w
  • u is the point at witch is the function valuated most recently

The new point according parabolic rule is accepted if

  • fall within bounding interval (a,b)
  • imply a movement from the best current value x is less than half the movement of the step before last

This rules ensure that the algorithm converge to some point, not just periodically skipping around.

First derivation method[edit]

This algorithm is trying upgrading the speed of finding the minimum by using information about derivation. The basis is still same, we have bracketing triplet a,b,c and derivates help us to choose new points. The sign of derivation in the middle point b help us decide if the new point is in the interval (a,b) or (b,c). Than we by secant method find new point - as the second derivate is used the value of derivation from the second best so far point from Brent's method.

There exist also other approaches, which trying using more sophisticated methods, e.g. more dimensional polynoms and values of derivations in points. But there is a danger of unexpectable behavior of high dimensional polynoms.

Numerical simulations[edit]

I implement these three methods in C++ language and test them on few function. I hope that functions which I choose can make algorithms some problems. The configuration of my computer is 1,6 GHz, 256MB RAM. Each method was 10 000 repeated and time was measure in milliseconds. I used different constraints for same function, because it shows that different starting values can give different results. The explanation is that algorithm can skip some local minimum, or due to numerical rounding of numbers, the stopping criterion can change. Here are tables and results:

Function 1:

y=x^2 ,(-1,1)\,\!


Method f(x) x Number of iterations Time
Golden search 0 -0.000173072 18 0:78
Parabolic interpolation 0 5.02293e-009 6 0:15
First Derivation 0 0 2 0:00.47

Function 2:

y=Exp[-30(x)]+x^2+1000(x^2 Log[x])^5,(0,1)\,\!

Method f(x) x Number of iterations Time
Golden search 0.033396 0.153591 24 0.297
Parabolic interpolation 0.033396 0.153586 19 0.234
First Derivation 0.0918101 0.464192 8 1.203

y=Exp[-30(x)]+x^2+1000(x^2 Log[x])^5,(0,2)\,\!

Method f(x) x Number of iterations Time
Golden search 0.033396 0.153591 26 0.344
Parabolic interpolation 0.0918101 0.464267 20 0.250
First Derivation 0.0918101 0.464192 11 0.296


Function 3: y=Exp[-3(x-2)]+5(x+4x^2+  Sin[20x]),(0,5)\,\!

Method f(x) x Number of iterations Time
Golden search 40.3546 1.17021 23 0.125
Parabolic interpolation 43.8838 0.886807 13 0.63
First Derivation 43.8838 0.886807 13 0.125


Function 4: y= 2.5/(Exp[x^2])+Log[(x)^2+0.01],(-5,5)\,\!

Method f(x) x Number of iterations Time
Golden search -2.10517 -0.000173072 18 0.94
Parabolic interpolation -2.10517 -2.68221e-007 5 0.31
First Derivation -2.10517 0 2 0.12

Conclusion: Parabolic and derivation method are faster, but are more sensitive on behavior of function.

Simulated annealing[edit]

Finally I add a chapter about simulated annealing. This algorithm is not unique for univariate case, but it can be also applied in one dimension. It is one representative example from rich family of probabilistic meta-heuristic algorithm for the global optimization problem. It is an analogy between the way in which a metal cools and freezes into a minimum energy crystalline structure. It was invented by S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi in 1983. Basic concept is that each point s of the search space is a state of some physical system. It have its energy E(s) determined by temperature T. It has three parts:

  • generating distribution of the new points
  • acceptance distribution of the new points
  • annealing schedule which control the temperature


  1. Choose a new point x_{i+1}\,\!
  2. If a energy E(x_{i+1})<E(x_i)\,\!, the new point is accepted
  3. If a energy E(x_{i+1})>E(x_i)\,\!, the new point is accepted with probability

 P= \frac {exp(-E(x_{i+1})/T)}{exp(-E(x_{i+1})/T+exp(-E(x_{i+1})/T}= \frac {1}{1+exp(\Delta E/T)} 
\approx exp(-\Delta E/T)

4. Terminate the process, or go to 1

Formula for generating new point is:

  • x_{i+1}=x_i+\mathbf{Du}
  • \mathbf{u} is a vector of random numbers in the range (-1,1) - uniform distribution
  • \mathbf{D} is a diagonal matrix which defines the maximum change allowed in each variable

After a successful trial, i.e. after an accepted change in the solution, D is updated:

\mathbf{D_{i +1}} = (1-\alpha) \mathbf{D}_i + \alpha \omega \mathbf{R}

  • damping constant \alpha
  • weighting constant \omega
  • \mathbf{R} is a diagonal matrix the elements of which consist of the magnitudes of the successful changes

Annealing schedule consists of:

  • an initial temperature T_0
  • a final temperature T_f or a stopping criterion
  • a length for the Markov chains
  • a rule for decrementing the temperature - mostly linear or exponential on number of steps

A nice java applet for simulation of this process is on this page, you can run the whole process step by step and change a few initial values. Here are as example two pictures with different initial temperature.

simulated annealing
simulated annealing

Optimization software[edit]

There are three basic groups of software:

  • Robust mathematical tools (standart version contain only few algorithms, for more you have to buy extra libraries):
Matlab, Mathematica, Maple
  • Software for specifics topics
Optima, Dash, SOL etc.
  • Library of codes

For more detail decision you can use optimization software decision trees: NEOS Guide Platos Guide


  • N.I.M. Gould, S. Leyer, An introduction to algorithms for nonlinear optimization, Frontiers in Numerical Analysis (Durham 2002)
  • S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press 2004
  • P. Lachout, Matematické programování
  • NEOS Optimization guide http://www-new.mcs.anl.gov/otc/Guide/
  • B. P. Flannery, W.H. Press, S. A. Teukolsky, W.T. Vetterling, Numerical Recipes in C,The Art of Scientific Computinion, Second Edition, Cambridge University Press 1988, 1992
  • F. Busetti, Simulated annealing overview, http://www.geocities.com/francorbusetti/saweb.pdf
  • www.wikipedia.com, various pages