An Introduction to Evolutionary Multiobjective Optimization Algorithms Karthik Sindhya, PhD Postdoctoral Researcher Industrial Optimization Group Department of Mathematical Information Technology [email protected] http://users.jyu.fi/~kasindhy/ Overview Nature Inspired Algorithms Constraint handling Applications Nature Inspired Algorithms

Nature provide some of the efficient ways to solve problems Algorithms imitating processes in nature/inspired from nature Nature Inspired Algorithms. What type of problems? Aircraft wing design Nature Inspired Algorithms Wind turbine design BBC Performance improvement by 40%. They reduce turbulence across the surface, increasing angle of attack and decreasing drag. (Source: Popular Mechanics)

Bionic car Hexagonal plates - resulting in door paneling one-third lighter than conventional paneling, but just as strong. (Source: Popular Mechanics) Nature Inspired Algorithms Bullet train NATGEO Train's nose is designed after the beak of a kingfisher, which dives smoothly into water. (Source: Popular Mechanics)

Nature Inspired Algorithms for Optimization Optimization An act, process, or methodology of making something (as a design, system, or decision) as fully perfect, functional, or effective as possible. ( http://www.merriam-webster.com/dictionary) Nature as an optimizer Birds: Minimize drag. Humpback whale: Maximize maneuverability (enhanced lift devices to control flow over the flipper and maintain lift at high angles of attack). Boxfish: Minimize drag and maximize rigidity of exoskeleton. Kingfisher: Minimize micro-pressure waves. Consider an optimization problem of the form

Practical Optimization Problems Charecteristics! Objective and constraint functions can be nondifferentiable. Constraints nonlinear. Discrete/Discontinuous search space. Mixed variables (Integer, Real, Boolean etc.) Large number of constraints and variables. Objective functions can be multimodal. Multimodal functions have more than one optima, but can either have a single or more than one global optima. Computationally expensive objective functions and constraints. Practical Optimization Problems Charecteristics!

Decision vector Objective vector Simulation model Optimization algorithm Traditional Optimization Techniques Problems! Different methods for different types of problems. Constraint handling e.g. using panalty method is sensitive to penalty parameters. Often get stuck in local optima (lack global perspective).

Usually need knowledge of first/second order derivatives of objective functions and constraints. Nature Inspired Algorithms for Optimization Nature inspired algorithms Computational intelligence Fuzzy logic systems Neural networks

Nature Inspired Algorithms for Optimization Nature inspired algorithms Evolutionary algorithms Swarm optimization Genetic algorithm Particle swarm optimization

Differential evolution Ant colony optimization .... and many more. Evolution Humans Macintosh Nokia

Evolutionary Algorithms Offsprings created by reproduction, mutation, etc. Charles Darwin Natural selection procedure - A guided

search Individuals suited to the environment survive, reproduce and pass their genetic traits to offspring Populations adapt to their environment. Variations accumulate over time to generate new species Evolutionary Algorithms Terminologies 1. Individual - carrier of the genetic information (chromosome). It is characterized by its state in the search space, its fitness (objective function value).

2. Population - pool of individuals which allows the application of genetic operators. 3. Fitness function - The term fitness function is often used as a synonym for objective function. 4. Generation - (natural) time unit of the EA, an iteration step of an evolutionary algorithm. Evolutionary Algorithms Population Individual Crossover

Parents Mutation Offspring Evolutionary Algorithms Step 1 Step 2 t:= 0 Initialize P(t) Step 3 Step 4

Evaluate P(t) While not terminate do P(t) := variation [P(t)]; evaluate [P(t)]; P(t+1) := select [P(t) U P(t)]; t := t + 1; od Evolutionary algorithms = Selection + Crossover + Mutation Reproduced from Evolutionary Computation: Comments on the History and Current State Back et. al Evolutionary Algorithms Mean approaches

optimum Variance reduces Evolutionary Algorithms Robust scheme Random scheme Spe ciali sche zed me Efficiency

Robustness = Breadth + Efficiency Problem type (Goldberg, 1989) Evolutionary Algorithms Selection - Roulette wheel, Tournement, steady state, etc. Motivation is to preserve the best (make multiple copies) and eliminate the worst Crossover simulated binary crossover, Linear crossover, blend crossover, etc. Create new solutions by considering more than one individual

Global search for new and hopefully better solutions Mutation Polynomial mutation, random mutation, etc. Keep diversity in the population 010110 010100 (bit wise mutation) Evolutionary Algorithms Tournment selection 23 30 24 24

37 24 24 11 11 9 30 9

37 9 9 11 23 11 Tournment 2

Tournment 1 37 30 Deleted from population Evolutionary Algorithms Roulette wheel selection (proportional selection) Weaker solutions can survive. Evolutionary Algorithms Concept of exploration vs exploitation.

Exploration Search for promising solutions Crossover and mutation operators Exploitation preferring the good solutions Selection operator Excessive exploration Random search. Excessive exploitation Premature convergence. Evolutionary Algorithms Exploration Exploitation

Good evolutionary algorithm Evolutionary Algorithms Classical gradient based algorithms Convergence to an optimal solution usually depends on the

starting solution. Most algorithms tend to get stuck to a locally optimal solution. An algorithm efficient in solving one class of optimization problem may not be efficient in solving others. Algorithms cannot be easily parallelized. Evolutionary algorithms

Convergence to an optimal solution is designed to be independent of initial population. A search based algorithm. Population helps not to get stuck to locally optimal solution. Can be applied to wide class of problems without major change in the algorithm. Can be easily parallelized.

Fitness Landscapes Using traditional gradient based methods f(x) Ideal and best case Multimodal f(x) x f(x) x

f(x) Nightmare Teaser x x Fitness Landscapes Using population based algorithms f(x) Ideal and best case Multimodal

f(x) x f(x) x f(x) Nightmare Teaser x x

History of Evolutionary Algorithms GA: John Holland in 1962 (UMich) Evolutionary Strategy: Rechenberg and Schwefel in 1965 (Berlin)

Evolutionary Programming: Larry Fogel in 1965 (California) First ICGA: 1985 in Carnegie Mellon University First GA book: Goldberg (1989) First FOGA workshop: 1990 in Indiana (Theory) First Fusion: 1990s (Evolutionary Algorithms) Journals: ECJ (MIT Press), IEEE TEC, Natural Computation (Elsevier) GECCO and CEC since 1999, PPSN since 1990 About 20 major conferences each year Working cycle of a genetic algorithm Population based probabilistic search and optimization technique based on natural genetics and Darwins principle of natural selection Proposed by Prof. John Holland, University of

Michigan, Ann Arbor, USA "I have more ideas than I can ever follow up on in a lifetime, so I never worry if someone steals an idea from me. -- John Holland, 1929-2015 https://www.nytimes.com/2015/08/20/science/john-henryholland-computerized-evolution-dies-at-86.html Working cycle of a genetic algorithm Start GA starts with a population of initial

solutions generated at random. Fitness/goodness value of each solution in the population is calculated Initialize a population of solutions Gen = 0 Gen Max_gen Y N End

Assign fitness to all solutions in the population Reproduction Minimization can be converted to maximization Crossover Mutation Gen = Gen + 1 Working cycle of a genetic algorithm Selection scheme (reproduction)

Select good solutions using their fitness values Leads to mating pool consisting of good solutions probabilistically Mating pool may contain multiple copies of a particularly good solution Size of mating pool is kept equal to that of the population of solutions considered before reproduction Average fitness of the mating pool is expected to be higher than that of pre-reproduction population of solutions Schemes Ruolette wheel Tournament selection Ranking selection Working cycle of a genetic algorithm Crossover Mating pairs are selected at random from the mating pool

New solutions by crossover with a crossover probability Exchange of properties between the parents and new solutions are created Parents are good, children are expected to be good Various types of crossover: Single-point Two-point Multi-point Uniform

SBX Working cycle of a genetic algorithm Mutation Sudden change of parameter In GA, local change around the current solution If a solution gets stuck at the local minimum, helps to come out of this situation http://www.dewebsite.org/whatis/whatis_pics/mutation2.jpg Termination Maximum number of generations Desired accuracy in the solution Binary-Coded GA

Let us consider the following problem: Subject to Binary-Coded GA Step 1 Generation of a population of solutions: Initial population of solutions of size N is selected at random Solutions represented in the form of binary strings 1s and 0s 4 bit string: number of distinct substrings possible is

http://www.conservapedia.com/images/thumb/1/1c/763.jpg/ 400px-763.jpg 5 bit string: number of distinct substrings possible is Generalizing: is the obtainable accuracy ()) Binary-Coded GA The length of binary string is decided based on a desired precision in the value of the variables Let us assume 10 bits for each variable and GA string is 20 bit string. Initial population of GA strings at random is Binary-Coded GA

The parents or mating pairs are selected at random from the mating pool Check if they can participate in crossover operation given by crossover probability Choose a random crossover site between 1 and L1, where L is the length of the strign Single-point crossover: 1 2 1 2 Binary-Coded GA 2 point crossover Binary-Coded GA

Uniform crossover Binary-Coded GA More crossover points more disruption Large search space, uniform crossover is found to perform better than both the single-point and two-point crossovers Step 5: Mutation Binary-Coded GA Mutation probability () kept low value Range of , where L is the string length Genetic operators has to perform two potential roles, such as disruption and

construction Mutation disruption Crossover - construction Binary-Coded GA Tournament selection Tournament size n (say 2 or 3), small number compared to population size, N. Pick n strings from the population, at random and determine the best one in terms of fitness value Best string goes to mating pool and the n strings back to population N tournaments are to be played to make the size of mating pool equal to N. Interesting read A comparative analysis of selection schemes used in genetic algorithms

Constraint Handling Penalty parameter-less approach A feasible solution is preferred to infeasible solution When both solutions feasible, choose the solution with better function value When both solutions are infeasible, choose the solution with lower constraint violation Constraint Handling Box constraints If variable is lower/higher than lower/upper bound, set to lower/upper bound

A random value inside the bounds Limitations of Evolutionary Algorithms No guarantee of finding an optimal solution in finite time Asymptotic convergence Containing a number of parameters Sometimes the result is highly dependent on the parameters set Self-adaptive parameters are commonly used Computationally very expensive Metamodels of functions are commonly used

Applications Application 1 Tracking suspect Caldwell and Johnston, 1991 Objective function: fitness rating on a nine point scale Applications Optimization (Min/Max) of functions Airfoil optimization Evolving optimal structure

Games Evolutionary Multi-objective Optimization A Big Picture Karthik Sindhya, PhD Postdoctoral Researcher Industrial Optimization Group Department of Mathematical Information Technology [email protected] http://users.jyu.fi/~kasindhy/ Objectives The objectives of this lecture are to: 1. Discuss the transition: Single objective optimization to Multi-objective optimization

2. Review the basic terminologies and concepts in use in multi-objective optimization 3. Introduce evolutionary multi-objective optimization 4. Goals in evolutionary multi-objective optimization 5. Main Issues in evolutionary multi-objective optimization Reference Books: K. Deb. Multi-Objective Optimization using Evolutionary Algorithms. Wiley, Chichester, 2001. K. Miettinen. Nonlinear Multiobjective Optimization. Kluwer, Boston, 1999. Transition

Minimize: Minimize: Cost Cost Single objective: Maximize Performance Maximize: Performance Basic terminologies and concepts Multi-objective problem is usually of the form: Minimize/Maximize f(x) = (f1(x), f2(x),, fk(x)) subject to Multiple objectives, constraints and decision

variables gj(x) 0 hk(x) = 0 xL x xU Decision space Objective space Basic terminologies and concepts solution a dominates solution b, if a is no worse than b in

all objectives a is strictly better than b in at least one objective. 1 2 5 f2 (minimize) Concept of nondominated solutions: 3

4 2 3 2 4 5 6 f1 (minimize)

3 dominates 2 and 4 1 does not dominate 3 and 4 1 dominates 2 Basic terminologies and concepts Properties of dominance relationship Reflexive: The dominance relation is not reflexive. Since solution a does not dominate itself. Symmetric: The dominance relation is not symmetric. Solution a dominates b does not mean b dominated a.

Dominance relation is asymmetric. Dominance relation is not antisymmetric. Transitive: The dominance relation is transitive. If a dominates b and b dominates c, then a dominates c. If a does not dominate b, it does not mean b dominates a. Basic terminologies and concepts Finding Pareto-optimal/non-dominated solutions Among a set of solutions P, the non-dominated set of solutions P are those that are not dominated by any member of the set P. If the set of solutions considered is the entire feasible objective

space, P is Pareto optimal. Different approaches available. They differ in computational complexities. Naive and slow Worst time complexity is 0(MN2). Kung et al. approach O(NlogN) Basic terminologies and concepts Kung et al. approach Ascending order for minimization objective

P = {5,1,3,2,4} 1 2 5 f2 (minimize) Step 1: Sort objective 1 based on the descending order of importance. 3

4 2 3 5 2 4 5 f1 (minimize)

6 Basic terminologies and concepts P = {5,1,3,2,4} Front = {5} T = {5,1,3} {5,1} {5} B = {2,4}

{3} Front = {5} {1} Front(P) = {5} {2} Front = {2,4} {4} Basic terminologies and concepts Non-dominated sorting of population

Step 1: Set all non-dominated fronts Pj , j = 1,2, as empty sets and set non-domination level counter j = 1 Step 2: Use any one of the approaches to find the non-dominated set P of population P. Step 3: Update Pj = P and P = P\P. Step 4: If P , increment j = j + 1 and go to Step 2. Otherwise, stop and declare all non-dominated fronts Pi, i = 1,2,,j. Basic terminologies and concepts f2 (minimize) 1

Front 3 f2 (minimize) 4 5 3 f1 (minimize) Front 2 Front 1

2 f1 (minimize) Basic terminologies and concepts Pareto optimal fronts (objective space) For a K objective problem, usually Pareto front is K-1 dimensional Min-Max Max-Max Min-Min Max-Min

Basic terminologies and concepts Local and Global Pareto optimal front Local Pareto optimal front: Local dominance check. Objective space Decision space Locally Pareto optimal front Global Pareto optimal front is also local Pareto optimal front. Basic terminologies and concepts Ideal point:

Non-existent lower bound of the Pareto front. Upper bound of the Pareto front. f2 Nadir point: Objective space Znadir Min-Min Normalization of objective vectors: fnormi = (fi - ziutopia )/(zinadir - ziutopia )

Max point: A vector formed by the maximum objective function values of the entire/part of objective space. Usually used in evolutionary multi-objective optimization algorithms, as nadir point is difficult to estimate. Used as an estimate of nadir point and updated as and when new estimates are obtained. Zmaximum Zideal Zutopia

f1 Basic terminologies and concepts What are evolutionary multi-objective optimization algorithms? Evolutionary algorithms used to solve multi-objective optimization problems. EMO algorithms use a population of solutions to obtain a diverse set of solutions close to the Pareto optimal front.

Objective space Basic terminologies and concepts EMO is a population based approach Population evolves to finally converge on to the Pareto front. Multiple optimal solutions in a single run. In classical MCDM approaches Usually multiple runs necessary to obtain a set of Pareto optimal solutions. Usually problem knowledge is necessary. Goal in evolutionary multi-objective optimization

Goals in evolutionary multi-objective optimization algorithms To find a set of solutions as close as possible to the Pareto optimal front. To find a set of solutions as diverse as possible. To find a set of satisficing solutions reflecting the decision makers preferences. Satisficing: a decision-making strategy that attempts to meet criteria for adequacy, rather than to identify an optimal solution. Goal in evolutionary multi-objective optimization Objective space

Convergence Diversity Goal in evolutionary multi-objective optimization Objective space Convergence Goal in evolutionary multi-objective optimization Changes to single objective evolutionary algorithms

Fitness computation must be changed Non-dominated solutions are preferred to maintain the drive towards the Pareto optimal front (attain convergence) Emphasis to be given to less crowded or isolated solutions to maintain diversity in the population Goal in evolutionary multi-objective optimization What are less-crowded solutions ? Crowding can occur in decision space and/or objective phase. Decision space diversity sometimes are needed As in engineering design problems, all solutions would look the same.

Objective space Min-Min Decision space Main Issues in evolutionary multi-objective optimization How to maintain diversity and obtain a diverse set of Pareto optimal solutions? How to maintain non-dominated solutions? How to maintain the push towards the Pareto front ? (Achieve convergence) EMO History

1984 VEGA by Schaffer 1989 Goldberg suggestion 1993-95 - Non-Elitist methods MOGA, NSGA, NPGA 1998 Present Elitist methods NSGA-II, DPGA, SPEA, PAES etc. Evolutionary multi-objective algorithm design issues Karthik Sindhya, PhD Postdoctoral Researcher Industrial Optimization Group Department of Mathematical Information Technology [email protected]

http://users.jyu.fi/~kasindhy/ Objectives The objectives of this lecture are to: Address the design issues of evolutionary multi-objective optimization algorithms Fitness assignment Diversity preservation Elitism Explore ways to handle Constraints References K. Deb. Multi-Objective Optimization using Evolutionary Algorithms. Wiley, Chichester,

2001. E. Zitzler, M. Laumanns, S. Bleuler. A Tutorial on Evolutionary Multiobjective Optimization, in Metaheuristics for Multiobjective Optimisation, 3-38, Springer-Verlag, 2003. Algorithm design issues The approximation of the Pareto front is itself multi-objective. Convergence: Compute solutions as close as possible to Pareto front quickly. Diversity: Maximize the diversity of the Pareto solutions. It is impossible to describe

What a good approximation can be for a Pareto optimal front. Proximity to the Pareto optimal front. Fitness assignment Unlike single objective, multiple objectives exists. Fitness assignment and selection go hand in hand. Fitness assignment can be classified in to following categories: Scalarization based E.g. Weighted sum, MOEA/D Objective based VEGA

Dominance based NSGA-II Fitness assignment Scalarization based (Aggregation based): Aggregate the objective functions to form a single objective. Vary the parameters in the single objective function to generate multiple Pareto optimal solutions. Parameters weights f1(x), f2(x),, fk(x)

w1f1(x) + w2f2(x),, wkfk Or, max(wi(fi - zi )) F Fitness assignment Advantages Weighted sum Easy to understand and implement. Fitness assignment is computationally efficient. If time available is short can be used to quickly provide a Pareto optimal solution. Disadvantages - Weighted sum

Non-convex Pareto optimal fronts cannot be handled. Fitness assignment Objective based Switch between objectives in the selection phase. Every time an individual is chosen for reproduction, a different objective decides. E.g. Vector evaluated genetic algorithm (VEGA) proposed by David Schaffer. First implementation of an evolutionary multi-objective optimization algorithm. Subpopulations are created and each subpopulation is evaluated with a different objective.

Mating pool Population New population f1 Selection f2 f3 Reproduction Fitness assignment Advantages

Simple idea and easy to implement. Simple single objective genetic algorithm can be easily extended to handle multi-objective optimization problems. Has tendency to produce solutions near the individual best for every objective. An advantage when this property is desirable. Disadvantages Each solution is evaluated only with respect to one objective. In multi-objective optimization algorithm all solutions are important. Individuals may be stuck at local optima of individual objectives. Fitness assignment

Dominance based Pareto dominance based fitness ranking proposed by Goldberg in 1989. Different ways Dominance rank: Number of individuals by which an individual is dominated. E.g. MOGA, SPEA2 Dominance depth: The fitness is based on the front an individual belongs. NSGA-II Dominance count: Number of individuals dominated by an individual.

SPEA2 Fitness assignment 0 4 1 1 1 1 0

2 4 0 2 0 Dominance rank Dominance count 3 2

1 Dominance depth Diversity preservation Chance of an individual being selected Increases: Low number of solutions in its neighborhood. Decreases: High number of solutions in its neighborhood. There are at least three types: Kernel methods Nearest neighbor Histogram

Diversity preservation Kernel methods: Sum of f values, where f is a function of distance. E.g. NSGA f f f Nearest neighbor The perimeter of the cuboid formed by the nearest neighbors as the

vertices. E.g. NSGA-II i-1 i i+1 Diversity preservation Histogram Number of elements in a hyperbox. E.g. PAES Elitism Elitism is needed to preserve the promising

solutions No archive strategy Old population Offspring Old population Offspring New Archive New population New population

Archive Constraint handling Penalty function approach For every solution, calculate the overall constraint violation, OCV (sum of Constraint violations). Fm(xi) = fm(xi) + OCV Solution - (xi), fm(xi)- mth objective value for xi, , Fm(xi) Overall mth objective value for xi . OCV is added to each of the objective function values. Use constraints as additional objectives Usually used when feasible search space is very narrow. Constraint handling

Debs constraint domination strategy A solution xi constraint dominates a solution xj, if any is true: xi is feasible and xj is not. xi and xj are both infeasible, but xi has a smaller constraint violation. xi and xj are feasible and xi dominates xj. Advantages: Penalty less approach. Easy to implement and clearly distinguishes good from bad solutions. Can handle if population has only infeasible solutions. Disadvantages: Problem to maintain diversity of solutions. Slightly infeasible and near optimal solutions are not preferred over feasible solutions far from optima.

Non-dominated Sorting Genetic Algorithm (NSGA-II) Karthik Sindhya, PhD Postdoctoral Researcher Industrial Optimization Group Department of Mathematical Information Technology [email protected] http://users.jyu.fi/~kasindhy/ Objectives The objectives of this lecture is to: Understand the basic concept and working of NSGA-II Advantages and disadvantages

Reference K. Deb, S. Agarwal, A. Pratap, and T. Meyarivan. A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182197, 2002. NSGA-II Non-dominated sorting genetic algorithm II was proposed by Deb et al. in 2000. NSGA-II procedure has three features: It uses an elitist principle It emphasizes non-dominated solutions. It uses an explicit diversity preserving mechanism

NSGA-II NSGA-II Crossover & Mutation 2 1 NSGA-II Crowding distance To get an estimate of the density of solutions surrounding a particular solution.

Crowding distance assignment procedure Step 1: Set l = |F|, F is a set of solutions in a front. Set di = 0, i = 1,2,,l. Step 2: For every objective function m = 1,2,,M, sort the set in worse order of fm or find sorted indices vector: Im = sort(fm). NSGA-II Step 3: For m = 1,2,,M, assign a large distance to boundary solutions, i.e. set them to and for all other solutions j = 2 to (l-1), assign as follows: i-1 i

i+1 NSGA-II Crowded tournament selection operator A solution xi wins a tournament with another solution xj if any of the following conditions are true: If solution xi has a better rank, that is, ri < rj . If they have the same rank but solution xi has a better crowding distance than solution xj, that is, ri = rj and di > dj . Objective space NSGA-II Advantages: Explicit diversity preservation mechanism Overall complexity of NSGA-II is at most O(MN2)

Elitism does not allow an already found Pareto optimal solution to be deleted. Disadvantage: Crowded comparison can restrict the convergence. Non-dominated sorting on 2N size.