# Univariate optimization

## Inhaltsverzeichnis

# Introduction

## Definition

In mathematics, the term optimization, or mathematical programming, refers to the study of problems in which one seeks to minimize or maximize a real function by systematically choosing the values of real or integer variables from within an allowed set. Optimization problem consist of basic pieces:

- Objective function – mean a object which we want to minimize or maximize, e.g. we want to maximize profit or minimize necessary time for production
- A set of variables which affect the value of the objective function. In the manufacturing problem, the variables might include the amounts of different resources used or the time spent on each activity
- A set of constraints, which define the solution space of the problem. For example, we could not product a negative number of products. Constraints can help us to restrict the solution space and thus make the problem easier, but non-trivial constraints can make problem much harder. In border areas the function can have extreme, but does not have to have derivation equal to zero.

Mathematical description is

minimize

subject to

- : optimization variable
- : objective function
- : constraint functions

**optimal solution** has smallest value of among all vectors that satisfy the constraints

## Classification

Classification of optimization problems according search space

- Discrete , e.g. traveling salesmen problem
- Continuous
- Linear programming: is linear and is a polyhedron specified by linear equalities and inequalities
- Quadratic programming: is convex quadratic and is a polyhedron specified by linear equalities and inequalities
- Convex programming: is convex function and is a convex set
- Nonlinear programming: is nonlinear and is specified by nonlinear equalities and inequalities

There exist also another type of optimization problems, mainly from graph theory area, but they are out of topic of this thesis. The second important possibility how to classify optimization problem is according type of solution. We can be interested only in finding some extreme (local optimization), or we want to be sure that we found minimum or maximum (global optimization).

## Example

### Linear programming

As an example of optimization problem we could use plane cargo problem. The simples possibility is solve problem as a linear programming. In this case our cargo plane has three compartments for storing cargo:

Compartment | Weight capacity (tonnes) | Space capacity (cubic meters) |

Front | 10 | 6800 |
---|---|---|

Centre | 16 | 8700 |

Rear | 8 | 5300 |

The following four cargoes are available for shipment on the next flight:

Cargo | Weight (tonnes) | Volume (cubic metres/tonne) | Profit (£/tonne) |

C1 | 18 | 480 | 310 |
---|---|---|---|

C2 | 15 | 650 | 380 |

C3 | 23 | 580 | 350 |

C4 | 12 | 390 | 285 |

Assumptions

- each cargo can be split into whatever proportions/fractions we desire
- each cargo can be split between two or more compartments if we so desire
- the cargo can be packed into each compartment (for example if the cargo was spherical it would not be possible to pack a compartment to volume capacity, some free space is inevitable in sphere packing)
- all the data/numbers given are accurate

The objective is to determine how much (if any) of each cargo C1, C2, C3 and C4 should be accepted and how to distribute each among the compartments so that the total profit for the flight is maximized. The notation of this problem is following. We denote variables as:

- be the number of tonnes of cargo

(i=1,2,3,4 for C1, C2, C3 and C4 respectively)

- that is put into compartment j

(j=1 for Front, j=2 for Centre and j=3 for Rear)

- where xij >=0: i=1,2,3,4; j=1,2,3

Constraints:

1. We cannot pack more of each of the four cargoes than we have available

2. The weight capacity of each compartment must be respected

3. The volume (space) capacity of each compartment must be respected

Objective function

- The objective is to maximize total profit:

### Quadratic programming

If we add a amortization of plane, our objective function change to:

### Convex programming

A plane has to keep balance - we need add new convex constraints:

### Nonlinear programming

Adding a fuel cost change our problem to most general case:

## Aplications

Where do optimization problems arise?

- Economics:
*Consumer theory, supplier theory* - Finance:
*Optimal hedging, pricing* - Statistics:
*Data fitting, regression, pattern recognition* - Science, Engineering:
*Aerospace, product design, data mining* - Other Business decisions:
*Scheduling, production, organizational decisions* - Government:
*Military applications, fund allocation, etc.* - Other Personal decisions:
*Sports, on-field decisions, player acquisition, marketing*

# Univariate algorithms

We demand on our algorithm three basic requests. It should be fast, accurate and need small memory. Unfortunately these requirements are in opposite go often counter. Thus we have to make some compromise. Computation of and its derivations is the most time-consuming part. Therefore we try to minimize the number of them. According it, we can divide algorithms into these groups:

- Without using derivate
- With using first derivate
- With using a matrix of second derivates

In following chapter are presented three basics algorithms, which are unique for univariate case. They are intended for constrained problems.

## Golden section search method

Belong to the algorithms, which do not use derivates. It is analogy of finding a root of function by bisection method.

The root is supposed to have been bracketed in interval . One then evaluates the function in the intermediate point and obtains a new, smaller bracketing interval, either or . After a finite number of steps we get acceptably small interval which surely contain a root. The same process we need to do with minimum, but for bracketing minimum we need three points which satisfied this condition:

Now we choose a new point, suppose that is in interval . The second possibility - belong to - is similar. Now can arrive two cases, in each we are able to choose new smaller bracketing interval, which surely contain the minimum

The individual steps are illustrated on pictures. They was generated by java script from this page, you can run there whole process also with another functions.

Again, after finite number of steps we are able to find acceptably small interval with minimum. The remaining question is how to choose the new point x. Suppose that is the fraction of the way between and

Our next point is fraction beyond

Then length of new bracketing interval is

or

We want to make them equal, to keep the cutting speed of interval constant.

with

This formula gives us

The golden section search guarantees that after each new function evaluation will be the bracketing minimum interval just 0.61803 times the size of the preceding interval (if we start with triplet it golden rate, else it fast converge to it). The interesing perception is that golden rate have more applications and it is known already form antiquity, e.g. as a esthetic criterion. You can look on http://en.wikipedia.org/wiki/Golden_ratio

## Parabolic interpolation

In previous method we assume that the function can have strange behaving. If we assume that the function do not change its shape wild and look approximately like parabola, we can use for estimating new point this rule:

The better understanding how algorithm works provide picture. At the beginnig parabola go through points a,b,c. We found its minimum at point d and set a new parabola b,d,c. Now we found new minimum at point e and continuing in this process until we are not acceptably near to the minimum (the position of new point is not vary a lot). It was generated by java script from similar page as golden section method.

## Brent‘s method

More robust is an algorithm, which is able to recognize behavior of function. According it the algorithm switches between golden ratio and parabolic approach. One possibility is Brent's method. It keep in mind six values: a,b,u,v,w and x

- a,b are the borders points of interval with minimum
- x is the point with the very least function value found so far
- w is the point with the second least function value
- v is the previous value of w
- u is the point at witch is the function valuated most recently

The new point according parabolic rule is accepted if

- fall within bounding interval
- imply a movement from the best current value x is less than half the movement of the step before last

This rules ensure that the algorithm converge to some point, not just periodically skipping around.

## First derivation method

This algorithm is trying upgrading the speed of finding the minimum by using information about derivation. The basis is still same, we have bracketing triplet a,b,c and derivates help us to choose new points. The sign of derivation in the middle point b help us decide if the new point is in the interval or . Than we by secant method find new point - as the second derivate is used the value of derivation from the second best so far point from Brent's method.

There exist also other approaches, which trying using more sophisticated methods, e.g. more dimensional polynoms and values of derivations in points. But there is a danger of unexpectable behavior of high dimensional polynoms.

# Numerical simulations

I implement these three methods in C++ language and test them on few function. I hope that functions which I choose can make algorithms some problems. The configuration of my computer is 1,6 GHz, 256MB RAM. Each method was 10 000 repeated and time was measure in milliseconds. I used different constraints for same function, because it shows that different starting values can give different results. The explanation is that algorithm can skip some local minimum, or due to numerical rounding of numbers, the stopping criterion can change. Here are tables and results:

**Function 1:**

Method | f(x) | x | Number of iterations | Time |

Golden search | 0 | -0.000173072 | 18 | 0:78 |

Parabolic interpolation | 0 | 5.02293e-009 | 6 | 0:15 |

First Derivation | 0 | 0 | 2 | 0:00.47 |

**Function 2:**

Method | f(x) | x | Number of iterations | Time |

Golden search | 0.033396 | 0.153591 | 24 | 0.297 |

Parabolic interpolation | 0.033396 | 0.153586 | 19 | 0.234 |

First Derivation | 0.0918101 | 0.464192 | 8 | 1.203 |

Method | f(x) | x | Number of iterations | Time |

Golden search | 0.033396 | 0.153591 | 26 | 0.344 |

Parabolic interpolation | 0.0918101 | 0.464267 | 20 | 0.250 |

First Derivation | 0.0918101 | 0.464192 | 11 | 0.296 |

**Function 3:**

Method | f(x) | x | Number of iterations | Time |

Golden search | 40.3546 | 1.17021 | 23 | 0.125 |

Parabolic interpolation | 43.8838 | 0.886807 | 13 | 0.63 |

First Derivation | 43.8838 | 0.886807 | 13 | 0.125 |

**Function 4:**

Method | f(x) | x | Number of iterations | Time |

Golden search | -2.10517 | -0.000173072 | 18 | 0.94 |

Parabolic interpolation | -2.10517 | -2.68221e-007 | 5 | 0.31 |

First Derivation | -2.10517 | 0 | 2 | 0.12 |

**Conclusion:**
Parabolic and derivation method are faster, but are more sensitive on behavior of function.

# Simulated annealing

Finally I add a chapter about simulated annealing. This algorithm is not unique for univariate case, but it can be also applied in one dimension. It is one representative example from rich family of probabilistic meta-heuristic algorithm for the global optimization problem. It is an analogy between the way in which a metal cools and freezes into a minimum energy crystalline structure. It was invented by S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi in 1983. Basic concept is that each point s of the search space is a state of some physical system. It have its energy E(s) determined by temperature T. It has three parts:

- generating distribution of the new points
- acceptance distribution of the new points
- annealing schedule which control the temperature

**Algorithm:**

- Choose a new point
- If a energy , the new point is accepted
- If a energy , the new point is accepted with probability

- 4. Terminate the process, or go to 1

**Formula for generating new point is:**

- is a vector of random numbers in the range (-1,1) - uniform distribution
- is a diagonal matrix which defines the maximum change allowed in each variable

After a successful trial, i.e. after an accepted change in the solution, D is updated:

- damping constant
- weighting constant
- is a diagonal matrix the elements of which consist of the magnitudes of the successful changes

**Annealing schedule consists of:**

- an initial temperature
- a final temperature or a stopping criterion
- a length for the Markov chains
- a rule for decrementing the temperature - mostly linear or exponential on number of steps

A nice java applet for simulation of this process is on this page, you can run the whole process step by step and change a few initial values. Here are as example two pictures with different initial temperature.

# Optimization software

There are three basic groups of software:

- Robust mathematical tools (standart version contain only few algorithms, for more you have to buy extra libraries):

- Matlab, Mathematica, Maple

- Software for specifics topics

- Optima, Dash, SOL etc.

- Library of codes

- NAG, IMSL, MINPACK

For more detail decision you can use optimization software decision trees: NEOS Guide Platos Guide

# Bibliography

- N.I.M. Gould, S. Leyer, An introduction to algorithms for nonlinear optimization, Frontiers in Numerical Analysis (Durham 2002)
- S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press 2004
- P. Lachout, Matematické programování
- NEOS Optimization guide http://www-new.mcs.anl.gov/otc/Guide/
- B. P. Flannery, W.H. Press, S. A. Teukolsky, W.T. Vetterling, Numerical Recipes in C,The Art of Scientific Computinion, Second Edition, Cambridge University Press 1988, 1992
- F. Busetti, Simulated annealing overview, http://www.geocities.com/francorbusetti/saweb.pdf
- www.wikipedia.com, various pages