This chapter aims to address some of the fundamental issues that are often encountered in optimization problems, making them difficult to solve. These issues include premature convergence, ruggedness, causality, deceptiveness, neutrality, epistasis, robustness, overfitting, oversimplification, multi-objectivity, dynamic fitness, the No Free Lunch Theorem, etc. We explain why these issues make optimization problems hard to solve and present some possible countermeasures for dealing with them. By doing this, we hope to help both practitioners and fellow researchers to create more efficient optimization applications and novel algorithms.
Figures - uploaded by Raymond Chiong
Author content
All figure content in this area was uploaded by Raymond Chiong
Content may be subject to copyright.
Discover the world's research
- 20+ million members
- 135+ million publications
- 700k+ research projects
Join for free
Why Is Optimization Difficult?
Thomas Weise, Michael Zapf, Raymond Chiong, and Antonio J. Nebro
Abstract This chapter aims to address some of the fundamental issues that
are often encountered in optimization problems, making them difficult to
solve. These issues include premature convergence, ruggedness, causality, de-
ceptiveness, neutrality, epistasis, robustness, overfitting, oversimplification,
multi-objectivity, dynamic fitness, the No Free Lunch Theorem, etc. We ex-
plain why these issues make optimization problems hard to solve and present
some possible countermeasures for dealing with them. By doing this, we hope
to help both practitioners and fellow researchers to create more efficient op-
timization applications and novel algorithms.
Preview
Thisdocumentisapreviewversion
andnotnecessarilyidenticalwith
theoriginal.
http://www.it-weise.de/
Original available at http://www.springer.com/engineering/book/
978-3-642-00266-3
See citation suggestion at the end of this document
Thomas Weise · Michael Zapf
Distributed Systems Group, University of Kassel, Wilhelmsh¨oher Allee 73, 34121 Kassel,
Germany, e-mail: weise@vs.uni-kassel.de and zapf@vs.uni-kassel.de
Raymond Chiong
School of Computing & Design, Swinburne University of Technology (Sarawak Campus),
93350 Kuching, Sarawak, Malaysia, e-mail: rchiong@swinburne.edu.my
Antonio J. Nebro
Dept. Lenguajes y Ciencias de la Computaci´on, ETSI Inform´atica, University of M´alaga,
Campus de Teatinos, 29071 M´alaga, Spain, e-mail: antonio@lcc.uma.es
1
2 Weise, Zapf, Chiong, Nebro
1 Introduction
Optimization, in general, is concerned with finding the best solutions for a
given problem. Its applicability in many different disciplines makes it hard
to give an exact definition. Mathematicians, for instance, are interested in
finding the maxima and minima of a real function from within an allowable
set of variables. In computing and engineering, the goal is to maximize the
performance of a system or application with minimal runtime and resources.
In the business industry, people aim to optimize the efficiency of a production
process or the quality and desirability of their current products.
All these examples show that optimization is indeed part of our everyday
life. We often try to maximize our gain by minimizing the cost we need to
bear. However, are we really able to achieve an "optimal" condition? Frankly,
whatever problems we are dealing with, it is rare that the optimization pro-
cess will produce a solution that is truly optimal. It may be optimal for one
audience or for a particular application, but definitely not in all cases.
As such, various techniques have emerged for tackling different kinds of
optimization problems. In the broadest sense, these techniques can be classi-
fied into exact and stochastic algorithms. Exact algorithms, such as branch
and bound, A⋆ search, or dynamic programming can be highly effective for
small-size problems. When the problems are large and complex, especially
if they are either NP-complete or NP-hard, i.e., have no known polynomial-
time solutions, the use of stochastic algorithms becomes mandatory. These
stochastic algorithms do not guarantee an optimal solution, but they are able
to find quasi-optimal solutions within a reasonable amount of time.
In recent years, metaheuristics, a family of stochastic techniques, has be-
come an active research area. They can be defined as higher level frameworks
aimed at efficiently and effectively exploring a search space [25]. The initial
work in this area was started about half a century ago (see [175, 78, 24], and
[37]). Subsequently, a lot of diverse methods have been proposed, and to-
day, this family comprises many well-known techniques such as Evolutionary
Algorithms, Tabu Search, Simulated Annealing, Ant Colony Optimization,
Particle Swarm Optimization, etc.
There are different ways of classifying and describing metaheuristic al-
gorithms. The widely accepted classification would be the view of nature-
inspired vs. non nature-inspired, i.e., whether or not the algorithm some-
how emulates a process found in nature. Evolutionary Algorithms, the most
widely used metaheuristics, belong to the nature-inspired class. Other tech-
niques with increasing popularity in this class include Ant Colony Optimiza-
tion, Particle Swarm Optimization, Artificial Immune Systems, and so on.
Scatter search, Tabu Search, and Iterated Local Search are examples of non
nature-inspired metaheuristics. Unified models of metaheuristic optimization
procedures have been proposed by Vaessens et al [220, 221], Rayward-Smith
[169], Osman [158], and Taillard et al [210].
Why Is Optimization Difficult? 3
In this chapter, our main objective is to address some fundamental issues
that make optimization problems difficult based on the nature-inspired class
of metaheuristics. Apart from the reasons of being large, complex, and dy-
namic, we present a list of problem features that are often encountered and
explain why some optimization problems are hard to solve. Some of the is-
sues that will be discussed, such as multi-modality and overfitting, concern
global optimization in general. We will also elaborate on other issues which
are often linked to Evolutionary Algorithms, e.g., epistasis and neutrality,
but can occur in virtually all metaheuristic optimization processes.
These concepts are important, as neglecting any one of them during the
design of the search space and operations or the configuration of the opti-
mization algorithms can render the entire invested effort worthless, even if
highly efficient optimization methods are applied. To the best of our knowl-
edge, to date there is not a single document in the literature comprising all
such problematic features. By giving clear definitions and comprehensive in-
troductions on them, we hope to create awareness among fellow scientists as
well as practitioners in the industry so that they could perform optimization
tasks more efficiently.
The rest of this chapter is organized as follows: In the next section, prema-
ture convergence to local minima is introduced as one of the major symptoms
of failed optimization processes. Ruggedness (Section 3), deceptiveness (Sec-
tion 4), too much neutrality (Section 5), and epistasis (Section 6), some of
which have been illustrated in Fig. 11 , are the main causes which may lead
to this situation. Robustness, correctness, and generality instead are features
which we expect from valid solutions. They are challenged by different types
of noise discussed in Section 7 and the affinity of overfitting or overgeneral-
ization (see Section 8). Some optimization tasks become further complicated
because they involve multiple, conflicting objectives (Section 9) or dynami-
cally changing ones (Section 10). In Section 11, we give a short introduction
into the No Free Lunch Theorem, from which we can follow that no panacea,
no magic bullet can exist against all of these problematic features. We will
conclude our outline of the hardships of optimization with a summary in
Section 12.
1.1 Basic Terminology
In the following text, we will utilize a terminology commonly used in Evolu-
tionary Algorithms community and sketched in Fig. 2 based on the example
1We include in Fig. 1 different examples of fitness landscapes, which relate solution can-
didates (or genotypes) to their objective values. The small bubbles in Fig. 1 represent
solution candidates under investigation. An arrow from one bubble to another means that
the second individual is found by applying one search operation to the first one. The
objective values here are subject to minimization.
4 Weise, Zapf, Chiong, Nebro
objectivevaluesf(x)
x
Fig. 1.a: Best Case
objectivevaluesf(x)
x
Fig. 1.b: Low Total Variation
multiple(local)optima
objectivevaluesf(x)
x
Fig. 1.c: Multimodal
nousefulgradientinformation
? ??
objectivevaluesf(x)
Fig. 1.d: Rugged
regionwithmisleading
gradientinformation
objectivevaluesf(x)
x
Fig. 1.e: Deceptive
?
neutralarea
objectivevaluesf(x)
x
Fig. 1.f: Neutral
??
neutral area or
areawithoutmuch
information
needle
(isolated
optimum)
objectivevaluesf(x)
x
Fig. 1.g: Needle-In-A-Haystack
?
objectivevaluesf(x)
x
Fig. 1.h: Nightmare
Fig. 1: Different possible properties of fitness landscapes (minimization).
Why Is Optimization Difficult? 5
(1,3)
(3,3)
(0,2)
(0,2)
(0,2)
(0,2)
Genotype-PhenotypeMapping
ObjectiveFunction(s)
(0,0) (0,1) (0,2) (0,3)
(1,0) (1,1) (1,2) (1,3)
(2,0) (2,1) (2,2) (2,3)
(3,0) (3,1) (3,2) (3,3)
ProblemSpace X
SearchSpace G
0100
1010
1100 1101
0000 1000 1001
0101
0110 1110 1111
0111
0001
0010 0011 1011
ObjectiveSpace R n
Genotype-PhenotypeMapping
ObjectiveFunction(s)
(0,0) (0,1) (0,3)
(1,0) (1,1) (1,2) (1,3)
(2,0) (2,1) (2,2) (2,3)
(3,0) (3,1) (3,2) (3,3)
Population(Phenotypes)
Population(Genotypes)
ObjectiveValues
1111
1111
1110
1000
0100
0111
0111
0010
0010
0010
0010
1111
f (x)
1
f (x)
1
fn
(xÎX ) ÎR
TheInvolvedSpaces TheInvolvedSets/Elements
Fig. 2: The involved spaces and sets in optimization.
of a simple Genetic Algorithm. The possible solutions x of an optimization
problem are elements of the problem space X . Their utility as solutions is
evaluated by a set f of objective functions f which, without loss of general-
ity, are assumed to be subject to minimization. The set of search operations
utilized by the optimizers to explore this space does not directly work on
them. Instead, they are applied to the elements (the genotypes) of the search
space G (the genome). They are mapped to the solution candidates by a
genotype-phenotype mapping gpm : G 7→ X . The term individual is used for
both, solution candidates and genotypes.
6 Weise, Zapf, Chiong, Nebro
1.2 The Term "Difficult"
Before we go more into detail about what makes these landscapes difficult,
we should establish the term in the context of optimization. The degree of
difficulty of solving a certain problem with a dedicated algorithm is closely
related to its computational complexity, i.e., the amount of resources such as
time and memory required to do so. The computational complexity depends
on the number of input elements needed for applying the algorithm. This
dependency is often expressed in the form of approximate boundaries with
the Big-O -family notations introduced by Bachmann [10] and made popular
by Landau [122]. Problems can be further divided into complexity classes. One
of the most difficult complexity classes owning to its resource requirements is
NP, the set of all decision problems which are solvable in polynomial time by
non-deterministic Turing machines [79]. Although many attempts have been
made, no algorithm has been found which is able to solve an NP -complete [79]
problem in polynomial time on a deterministic computer. One approach to
obtaining near-optimal solutions for problems in NP in reasonable time is to
apply metaheuristic, randomized optimization procedures.
As already stated, optimization algorithms are guided by objective func-
tions. A function is difficult from a mathematical perspective in this context
if it is not continuous, not differentiable, or if it has multiple maxima and
minima. This understanding of difficulty comes very close to the intuitive
sketches in Fig. 1.
In many real world applications of metaheuristic optimization, the charac-
teristics of the objective functions are not known in advance. The problems
are usually NP or have unknown complexity. It is therefore only rarely possi-
ble to derive boundaries for the performance or the runtime of optimizers in
advance, let alone exact estimates with mathematical precision.
Most often, experience, rules of thumb, and empirical results based on
the models obtained from related research areas such as biology are the only
guides available. In this chapter we discuss many such models and rules,
providing a better understanding of when the application of a metaheuristic
is feasible and when not, as well as with indicators on how to avoid defining
problems in a way that makes them difficult.
2 Premature Convergence
2.1 Introduction
An optimization algorithm has converged if it cannot reach new solution
candidates anymore or if it keeps on producing solution candidates from a
Why Is Optimization Difficult? 7
"small"2 subset of the problem space. Global optimization algorithms will
usually converge at some point in time. One of the problems in global opti-
mization is that it is often not possible to determine whether the best solution
currently known is situated on a local or a global optimum and thus, if con-
vergence is acceptable. In other words, it is usually not clear whether the
optimization process can be stopped, whether it should concentrate on re-
fining the current optimum, or whether it should examine other parts of the
search space instead. This can, of course, only become cumbersome if there
are multiple (local) optima, i.e., the problem is multimodal as depicted in
Fig. 1.c.
A mathematical function is multimodal if it has multiple maxima or min-
ima [195, 246]. A set of objective functions (or a vector function) f is multi-
modal if it has multiple (local or global) optima – depending on the definition
of "optimum" in the context of the corresponding optimization problem.
2.2 The Problem
An optimization process has prematurely converged to a local optimum if it
is no longer able to explore other parts of the search space than the area
currently being examined and there exists another region that contains a
superior solution [192, 219]. Fig. 3 illustrates examples for prematurely con-
vergence.
globaloptimum
localoptimum
Fig. 3.a: Example 1: Maximization
objectivevaluesf(x)
x
Fig. 3.b: Example 2: Minimization
Fig. 3: Premature convergence in the objective space.
2according to a suitable metric like numbers of modifications or mutations which need to
be applied to a given solution in order to leave this subset
8 Weise, Zapf, Chiong, Nebro
The existence of multiple global optima itself is not problematic and the
discovery of only a subset of them can still be considered as successful in many
cases (see Section 9). The occurrence of numerous local optima, however, is
more complicated.
The phenomenon of domino convergence has been brought to attention by
Rudnick [184] who studied it in the context of his BinInt problem [184, 213].
In principle, domino convergence occurs when the solution candidates have
features which contribute significantly to different degrees of the total fitness.
If these features are encoded in separate genes (or building blocks) in the
genotypes, they are likely to be treated with different priorities, at least in
randomized or heuristic optimization methods.
Building blocks with a very strong positive influence on the objective val-
ues, for instance, will quickly be adopted by the optimization process (i.e.,
"converge"). During this time, the alleles of genes with a smaller contribution
are ignored. They do not come into play until the optimal alleles of the more
"important" blocks have been accumulated. Rudnick [184] called this sequen-
tial convergence phenomenon domino convergence due to its resemblance to
a row of falling domino stones [213].
In the worst case, the contributions of the less salient genes may almost
look like noise and they are not optimized at all. Such a situation is also an
instance of premature convergence, since the global optimum which would
involve optimal configurations of all blocks will not be discovered. In this
situation, restarting the optimization process will not help because it will
always turn out the same way. Example problems which are often likely to
exhibit domino convergence are the Royal Road [139] and the aforementioned
BinInt problem [184].
2.3 One Cause: Loss of Diversity
In biology, diversity is the variety and abundance of organisms at a given place
and time [159, 133]. Much of the beauty and efficiency of natural ecosystems
is based on a dazzling array of species interacting in manifold ways. Diversifi-
cation is also a good investment strategy utilized by investors in the economy
in order to increase their profit.
In population-based global optimization algorithms as well, maintaining a
set of diverse solution candidates is very important. Losing diversity means
approaching a state where all the solution candidates under investigation are
similar to each other. Another term for this state is convergence. Discus-
sions about how diversity can be measured have been provided by Routledge
[183], Cousins [49], Magurran [133], Morrison and De Jong [148], and Paenke
et al [159].
Why Is Optimization Difficult? 9
Preserving diversity is directly linked with maintaining a good balance be-
tween exploitation and exploration [159] and has been studied by researchers
from many domains, such as
•Genetic Algorithms [156, 176, 177],
•Evolutionary Algorithms [28, 29, 123, 149, 200, 206],
•Genetic Programming [30, 38, 39, 40, 53, 93, 94],
•Tabu Search [81, 82], and
•Particle Swarm Optimization [238].
The operations which create new solutions from existing ones have a very
large impact on the speed of convergence and the diversity of the populations
[69, 203]. The step size in Evolution Strategy is a good example of this issue:
setting it properly is very important and leads to the "exploration versus
exploitation" problem [102] which can be observed in other areas of global
optimization as well.3
In the context of optimization, exploration means finding new points in
areas of the search space which have not been investigated before. Since
computers have only limited memory, already evaluated solution candidates
usually have to be discarded. Exploration is a metaphor for the procedure
which allows search operations to find novel and maybe better solution struc-
tures. Such operators (like mutation in Evolutionary Algorithms) have a high
chance of creating inferior solutions by destroying good building blocks but
also a small chance of finding totally new, superior traits (which, however, is
not guaranteed at all).
Exploitation, on the other hand, is the process of improving and combin-
ing the traits of the currently known solution(s), as done by the crossover
operator in Evolutionary Algorithms, for instance. Exploitation operations
often incorporate small changes into already tested individuals leading to
new, very similar solution candidates or try to merge building blocks of dif-
ferent, promising individuals. They usually have the disadvantage that other,
possibly better, solutions located in distant areas of the problem space will
not be discovered.
Almost all components of optimization strategies can either be used for in-
creasing exploitation or in favor of exploration. Unary search operations that
improve an existing solution in small steps can be built, hence being exploita-
tion operators (as is done in Memetic Algorithms, for instance). They can
also be implemented in a way that introduces much randomness into the indi-
viduals, effectively making them exploration operators. Selection operations
in Evolutionary Computation choose a set of the most promising solution
candidates which will be investigated in the next iteration of the optimizers.
They can either return a small group of best individuals (exploitation) or a
wide range of existing solution candidates (exploration).
3More or less synonymously to exploitation and exploration, the terms intensifications
and diversification have been introduced by Glover [81, 82] in the context of Tabu Search.
10 Weise, Zapf, Chiong, Nebro
Optimization algorithms that favor exploitation over exploration have
higher convergence speed but run the risk of not finding the optimal solution
and may get stuck at a local optimum. Then again, algorithms which per-
form excessive exploration may never improve their solution candidates well
enough to find the global optimum or it may take them very long to discover
it "by accident". A good example for this dilemma is the Simulated Anneal-
ing algorithm [117]. It is often modified to a form called simulated quenching
which focuses on exploitation but loses the guaranteed convergence to the
optimum [110]. Generally, optimization algorithms should employ at least
one search operation of explorative character and at least one which is able
to exploit good solutions further. There exists a vast body of research on the
trade-off between exploration and exploitation that optimization algorithms
have to face [7, 57, 66, 70, 103, 152].
2.4 Countermeasures
As we have seen, global optimization algorithms are optimization methods
for finding the best possible solution(s) of an optimization problem instead
of prematurely converging to a local optimum. Still, there is no general ap-
proach to ensure their success. The probability that an optimization process
prematurely converges depends on the characteristics of the problem to be
solved and the parameter settings and features of the optimization algorithms
applied [215].
A very crude and yet, sometimes effective measure is restarting the opti-
mization process at randomly chosen points in time. One example for this
method is GRASP s, Greedy Randomized Adaptive Search Procedures [71, 72],
which continuously restart the process of creating an initial solution and re-
fining it with local search. Still, such approaches are likely to fail in domino
convergence situations.
In order to extend the duration of the evolution in Evolutionary Algo-
rithms, many methods have been devised for steering the search away from
areas which have already been frequently sampled. This can be achieved by
integrating density metrics into the fitness assignment process. The most
popular of such approaches are sharing and niching based on the Euclidean
distance of the solution candidates in objective space [55, 85, 104, 138]. Using
low selection pressure furthermore decreases the chance of premature conver-
gence but also decreases the speed with which good solutions are exploited.
Another approach against premature convergence is to introduce the ca-
pability of self-adaptation, allowing the optimization algorithm to change its
strategies or to modify its parameters depending on its current state. Such
behaviors, however, are often implemented not in order to prevent prema-
ture convergence but to speed up the optimization process (which may lead
to premature convergence to local optima) [185, 186, 187].
Why Is Optimization Difficult? 11
3 Ruggedness and Weak Causality
3.1 The Problem: Ruggedness
Optimization algorithms generally depend on some form of gradient in the
objective or fitness space. The objective functions should be continuous and
exhibit low total variation4 , so the optimizer can descend the gradient easily.
If the objective functions are unsteady or fluctuating, i.e., going up and down,
it becomes more complicated for the optimization process to find the right
directions to proceed to. The more rugged a function gets, the harder it
becomes to optimize it. From a simplified point of view, ruggedness is multi-
modality plus steep ascends and descends in the fitness landscape. Examples
of rugged landscapes are Kauffman's fitness landscape [113, 115], the p-Spin
model [6], Bergman and Feldman's jagged fitness landscape [19], and the
sketch in Fig. 1.d.
3.2 One Cause: Weak Causality
During an optimization process, new points in the search space are created
by the search operations. Generally we can assume that the genotypes which
are the input of the search operations correspond to phenotypes which have
previously been selected. Usually, the better or the more promising an indi-
vidual is, the higher are its chances of being selected for further investigation.
Reversing this statement suggests that individuals which are passed to the
search operations are likely to have a good fitness. Since the fitness of a solu-
tion candidate depends on its properties, it can be assumed that the features
of these individuals are not so bad either. It should thus be possible for the
optimizer to introduce slight changes to their properties in order to find out
whether they can be improved any further5 . Normally, such modifications
should also lead to small changes in the objective values and, hence, in the
fitness of the solution candidate.
Definition 1 (Strong Causality). Strong causality (locality) means that
small changes in the properties of an object also lead to small changes in its
behavior [170, 171, 180].
This principle (proposed by Rechenberg [170, 171]) should not only hold
for the search spaces and operations designed for optimization, but applies
to natural genomes as well. The offspring resulting from sexual reproduction
of two fish, for instance, has a different genotype than its parents. Yet, it is
4http://en.wikipedia.org/wiki/Total_variation [accessed 2008-04-23]
5We have already mentioned this under the subject of exploitation.
12 Weise, Zapf, Chiong, Nebro
far more probable that these variations manifest in a unique color pattern of
the scales, for example, instead of leading to a totally different creature.
Apart from this straightforward, informal explanation here, causality has
been investigated thoroughly in different fields of optimization, such as Evolu-
tion Strategy [170, 65], structure evolution [129, 130], Genetic Programming
[65, 107, 179, 180], genotype-phenotype mappings [193], search operators [65],
and Evolutionary Algorithms in general [65, 182, 207].
In fitness landscapes with weak (low) causality, small changes in the so-
lution candidates often lead to large changes in the objective values, i.e.,
ruggedness. It then becomes harder to decide which region of the problem
space to explore and the optimizer cannot find reliable gradient information
to follow. A small modification of a very bad solution candidate may then
lead to a new local optimum and the best solution candidate currently known
may be surrounded by points that are inferior to all other tested individuals.
The lower the causality of an optimization problem, the more rugged its
fitness landscape is, which leads to a degradation of the performance of the
optimizer [120]. This does not necessarily mean that it is impossible to find
good solutions, but it may take very long to do so.
3.3 Countermeasures
To our knowledge, no viable method which can directly mitigate the effects of
rugged fitness landscapes exists. In population-based approaches, using large
population sizes and applying methods to increase the diversity can decrease
the influence of ruggedness, but only up to a certain degree. Utilizing the
Baldwin effect [13, 100, 101, 233] or Lamarckian evolution [54, 233], i.e.,
incorporating a local search into the optimization process, may further help
to smoothen out the fitness landscape [89].
Weak causality is often a home-made problem: it results from the choice
of the solution representation and search operations. Thus, in order to apply
Evolutionary Algorithms in an efficient manner, it is necessary to find repre-
sentations which allow for iterative modifications with bounded influence on
the objective values.
Why Is Optimization Difficult? 13
4 Deceptiveness
4.1 Introduction
Especially annoying fitness landscapes show deceptiveness (or deceptivity).
The gradient of deceptive objective functions leads the optimizer away from
the optima, as illustrated in Fig. 1.e.
The term deceptiveness is mainly used in the Genetic Algorithm commu-
nity in the context of the Schema Theorem. Schemas describe certain areas
(hyperplanes) in the search space. If an optimization algorithm has discov-
ered an area with a better average fitness compared to other regions, it will
focus on exploring this region based on the assumption that highly fit areas
are likely to contain the true optimum. Objective functions where this is not
the case are called deceptive [20, 84, 127]. Examples for deceptiveness are the
ND fitness landscapes [17], trap functions [1, 59, 112] like the one illustrated
in Fig. 4, and the fully deceptive problems given by Goldberg et al [86, 60].
u(x)
f(x) globaloptimium
withsmallbasin
ofattraction
localoptimium
withlargebasin
ofattraction
Fig. 4: Ackley's "Trap" function [1, 112].
4.2 The Problem
If the information accumulated by an optimizer actually guides it away from
the optimum, search algorithms will perform worse than a random walk or
an exhaustive enumeration method. This issue has been known for a long
time [228, 140, 141, 212] and has been subsumed under the No Free Lunch
Theorem which we will discuss in Section 11.
14 Weise, Zapf, Chiong, Nebro
4.3 Countermeasures
Solving deceptive optimization tasks perfectly involves sampling many indi-
viduals with very bad features and low fitness. This contradicts the basic ideas
of metaheuristics and thus, there are no efficient countermeasures against de-
ceptivity. Using large population sizes, maintaining a very high diversity, and
utilizing linkage learning (see Section 6.3.2) are, maybe, the only approaches
which can provide at least a small chance of finding good solutions.
5 Neutrality and Redundancy
5.1 The Problem: Neutrality
We consider the outcome of the application of a search operation to an el-
ement of the search space as neutral if it yields no change in the objective
values [15, 172]. It is challenging for optimization algorithms if the best solu-
tion candidate currently known is situated on a plane of the fitness landscape,
i.e., all adjacent solution candidates have the same objective values. As illus-
trated in Fig. 1.f, an optimizer then cannot find any gradient information and
thus, no direction in which to proceed in a systematic manner. From its point
of view, each search operation will yield identical individuals. Furthermore,
optimization algorithms usually maintain a list of the best individuals found,
which will then overflow eventually or require pruning.
The degree of neutrality νis defined as the fraction of neutral results
among all possible products of the search operations Op applied to a specific
genotype [15]. We can generalize this measure to areas G in the search space
Gby averaging over all their elements. Regions where ν is close to one are
considered as neutral.
∀g1 ∈G ⇒ν (g1 ) = |{g2 |P (g2 = Op(g1 ))> 0 ∧f (gpm(g2 ))= f(gpm(g1 ))}|
|{g2 |P (g2 = Op(g1 )) > 0 }| (1)
∀G ⊆G ⇒ν (G ) = 1
|G | X
g∈ G
ν( g) (2)
5.2 Evolvability
Another metaphor in global optimization borrowed from biological systems
is evolvability [52]. Wagner [225, 226] points out that this word has two uses
in biology: According to Kirschner and Gerhart [118], a biological system is
evolvable if it is able to generate heritable, selectable phenotypic variations.
Why Is Optimization Difficult? 15
Such properties can then be evolved and changed by natural selection. In its
second sense, a system is evolvable if it can acquire new characteristics via
genetic change that help the organism(s) to survive and to reproduce. The-
ories about how the ability of generating adaptive variants has evolved have
been proposed by Riedl [174], Altenberg [3], Wagner and Altenberg [227],
and Bonner [26], amongst others. The idea of evolvability can be adopted for
global optimization as follows:
Definition 2 (Evolvability). The evolvability of an optimization process in
its current state defines how likely the search operations will lead to solution
candidates with new (and eventually, better) objectives values.
The direct probability of success [170, 22], i.e., the chance that search op-
erators produce offspring fitter than their parents, is also sometimes referred
to as evolvability in the context of Evolutionary Algorithms [2, 5].
5.3 Neutrality: Problematic and Beneficial
The link between evolvability and neutrality has been discussed by many
researchers. The evolvability of neutral parts of a fitness landscape depends
on the optimization algorithm used. It is especially low for Hill Climbing
and similar approaches, since the search operations cannot directly provide
improvements or even changes. The optimization process then degenerates
to a random walk, as illustrated in Fig. 1.f. The work of Beaudoin et al [17]
on the ND fitness landscapes shows that neutrality may "destroy" useful
information such as correlation.
Researchers in molecular evolution, on the other hand, found indications
that the majority of mutations have no selective influence [77, 106] and that
the transformation from genotypes to phenotypes is a many-to-one mapping.
Wagner [226] states that neutrality in natural genomes is beneficial if it con-
cerns only a subset of the properties peculiar to the offspring of a solution
candidate while allowing meaningful modifications of the others. Toussaint
and Igel [214] even go as far as declaring it a necessity for self-adaptation.
The theory of punctuated equilibria in biology introduced by Eldredge and
Gould [67, 68] states that species experience long periods of evolutionary
inactivity which are interrupted by sudden, localized, and rapid phenotypic
evolutions [47, 134, 12]. It is assumed that the populations explore neutral
layers during the time of stasis until, suddenly, a relevant change in a genotype
leads to a better adapted phenotype [224] which then reproduces quickly.
The key to differentiating between "good" and "bad" neutrality is its de-
gree ν in relation to the number of possible solutions maintained by the
optimization algorithms. Smith et al [204] have used illustrative examples
similar to Fig. 5 showing that a certain amount of neutral reproductions can
foster the progress of optimization. In Fig. 5.a, basically the same scenario
16 Weise, Zapf, Chiong, Nebro
of premature convergence as in Fig. 3.a is depicted. The optimizer is drawn
to a local optimum from which it cannot escape anymore. Fig. 5.b shows
that a little shot of neutrality could form a bridge to the global optimum.
The optimizer now has a chance to escape the smaller peak if it is able to
find and follow that bridge, i.e., the evolvability of the system has increased.
If this bridge gets wider, as sketched in Fig. 5.c, the chance of finding the
global optimum increases as well. Of course, if the bridge gets too wide, the
optimization process may end up in a scenario like in Fig. 1.f where it cannot
find any direction. Furthermore, in this scenario we expect the neutral bridge
to lead to somewhere useful, which is not necessarily the case in reality.
globaloptimum
localoptimum
Fig. 5.a: Premature Con-
vergence
Fig. 5.b: Small Neutral
Bridge
Fig. 5.c: Wide Neutral
Bridge
Fig. 5: Possible positive influence of neutrality.
Examples for neutrality in fitness landscapes are the ND family [17], the
NKp [15] and NKq [155] models, and the Royal Road [139]. Another common
instance of neutrality is bloat in Genetic Programming [131].
5.4 Redundancy: Problematic and Beneficial
Redundancy in the context of global optimization is a feature of the genotype-
phenotype mapping and means that multiple genotypes map to the same
phenotype, i.e., the genotype-phenotype mapping is not injective. The role
of redundancy in the genome is as controversial as that of neutrality [230].
There exist many accounts of its positive influence on the optimization pro-
cess. Shackleton et al [194, 197], for instance, tried to mimic desirable evolu-
tionary properties of RNA folding [106]. They developed redundant genotype-
phenotype mappings using voting (both, via uniform redundancy and via a
non-trivial approach), Turing machine-like binary instructions, Cellular au-
Why Is Optimization Difficult? 17
tomata, and random Boolean networks [114]. Except for the trivial voting
mechanism based on uniform redundancy, the mappings induced neutral net-
works which proved beneficial for exploring the problem space. Especially the
last approach provided particularly good results [194, 197]. Possibly converse
effects like epistasis (see Section 6) arising from the new genotype-phenotype
mappings have not been considered in this study.
Redundancy can have a strong impact on the explorability of the prob-
lem space. When utilizing a one-to-one mapping, the translation of a slightly
modified genotype will always result in a different phenotype. If there ex-
ists a many-to-one mapping between genotypes and phenotypes, the search
operations can create offspring genotypes different from the parent which
still translate to the same phenotype. The optimizer may now walk along a
path through this neutral network. If many genotypes along this path can be
modified to different offspring, many new solution candidates can be reached
[197]. The experiments of Shipman et al [198, 196] additionally indicate that
neutrality in the genotype-phenotype mapping can have positive effects.
Yet, Rothlauf [182] and Shackleton et al [194] show that simple uniform
redundancy is not necessarily beneficial for the optimization process and
may even slow it down. There is no use in introducing encodings which, for
instance, represent each phenotypic bit with two bits in the genotype where
00 and 01 map to 0 and 10 and 11 map to 1.
5.5 Summary
Different from ruggedness which is always bad for optimization algorithms,
neutrality has aspects that may further as well as hinder the process of find-
ing good solutions. Generally we can state that degrees of neutrality ν very
close to 1 degenerate optimization processes to random walks. Some forms
of neutral networks [14, 15, 27, 105, 208, 222, 223, 237] accompanied by low
(nonzero) values of ν can improve the evolvability and hence, increase the
chance of finding good solutions.
Adverse forms of neutrality are often caused by bad design of the search
space or genotype-phenotype mapping. Uniform redundancy in the genome
should be avoided where possible and the amount of neutrality in the search
space should generally be limited.
18 Weise, Zapf, Chiong, Nebro
6 Epistasis
6.1 Introduction
In biology, epistasis is defined as a form of interaction between different genes
[163]. The term was coined by Bateson [16] and originally meant that one
gene suppresses the phenotypical expression of another gene. In the context
of statistical genetics, epistasis was initially called "epistacy" by Fisher [74].
According to Lush [132], the interaction between genes is epistatic if the ef-
fect on the fitness of altering one gene depends on the allelic state of other
genes. This understanding of epistasis comes very close to another biological
expression: Pleiotropy , which means that a single gene influences multiple
phenotypic traits [239]. In global optimization, such fine-grained distinctions
are usually not made and the two terms are often used more or less synony-
mously.
Definition 3 (Epistasis). In optimization, Epistasis is the dependency of
the contribution of one gene to the value of the objective functions on the
allelic state of other genes. [4, 51, 153]
We speak of minimal epistasis when every gene is independent of every
other gene. Then, the optimization process equals finding the best value for
each gene and can most efficiently be carried out by a simple greedy search
[51]. A problem is maximally epistasic when no proper subset of genes is
independent of any other gene [205, 153]. Examples of problems with a high
degree of epistasis are Kauffman's fitness landscape [113, 115], the p-Spin
model [6], and the tunable model of Weise et al [232].
6.2 The Problem
As sketched in Fig. 6, epistasis has a strong influence on many of the pre-
viously discussed problematic features. If one gene can "turn off" or affect
the expression of many other genes, a modification of this gene will lead to
a large change in the features of the phenotype. Hence, the causality will be
weakened and ruggedness ensues in the fitness landscape. On the other hand,
subsequent changes to the "deactivated" genes may have no influence on the
phenotype at all, which would then increase the degree of neutrality in the
search space. Epistasis is mainly an aspect of the way in which we define the
genome G and the genotype-phenotype mapping gpm. It should be avoided
where possible.
Generally, epistasis and conflicting objectives in multi-objective optimiza-
tion should be distinguished from each other. Epistasis as well as pleiotropy
is a property of the influence of the elements (the genes) of the genotypes
Why Is Optimization Difficult? 19
ruggedness multi-
modality
weakcausality
high
epistasis
ºcauses
neutrality
Fig. 6: The influence of epistasis on the fitness landscape.
on the phenotypes. Objective functions can conflict without the involvement
of any of these phenomena. We can, for example, define two objective func-
tions f1 ( x ) = x and f2 (x ) = −x which are clearly contradicting regardless of
whether they are subject to maximization or minimization. Nevertheless, if
the solution candidates x as well as the genotypes are simple real numbers
and the genotype-phenotype mapping is simply an identity mapping, neither
epistatic nor pleiotropic effects can occur.
Naudts and Verschoren [154] have shown for the special case of length-
two binary string genomes that deceptiveness does not occur in situations
with low epistasis and also that objective functions with high epistasis are
not necessarily deceptive. Another discussion about different shapes of fitness
landscapes under the influence of epistasis is given by Beerenwinkel et al [18].
6.3 Countermeasures
6.3.1 General
We have shown that epistasis is a root cause for multiple problematic fea-
tures of optimization tasks. General countermeasures against epistasis can be
divided into two groups. The symptoms of epistasis can be mitigated with
the same methods which increase the chance of finding good solutions in the
presence of ruggedness or neutrality – using larger populations and favor-
ing explorative search operations. Epistasis itself is a feature which results
from the choice of the search space structure, the search operations, and the
genotype-phenotype mapping. Avoiding epistatic effects should be a major
concern during their design. This can lead to a great improvement in the
quality of the solutions produced by the optimization process [231]. General
advice for good search space design is given in [84, 166, 178] and [229].
20 Weise, Zapf, Chiong, Nebro
6.3.2 Linkage Learning
According to Winter et al [240], linkage is "the tendency for alleles of different
genes to be passed together from one generation to the next" in genetics. This
usually indicates that these genes are closely located in the same chromosome.
In the context of Evolutionary Algorithms, this notation is not useful since
identifying spatially close elements inside the genotypes is trivial. Instead,
we are interested in alleles of different genes which have a joint effect on the
fitness [150, 151].
Identifying these linked genes, i.e., learning their epistatic interaction, is
very helpful for the optimization process. Such knowledge can be used to pro-
tect building blocks from being destroyed by the search operations. Finding
approaches for linkage learning has become an especially popular discipline
in the area of Evolutionary Algorithms with binary [99, 150, 46] and real
[63] genomes. Two important methods from this area are the messy Genetic
Algorithm (mGA) by Goldberg et al [86] and the Bayesian Optimization
Algorithm (BOA) [162, 41]. Module acquisition [8] may be considered as a
similar effort in the area of Genetic Programming.
Let us take the mGA as an illustrative example for this family of ap-
proaches. By explicitly allowing the search operations to rearrange the genes
in the genotypes, epistatically linked genes may get located closer to each
other by time. As sketched in Fig. 7, the tighter the building blocks are
packed, the less likely are they to be destroyed by crossover operations which
usually split parent genotypes at randomly chosen points. Hence, the opti-
mization process can strengthen the causality in the search space.
destroyedin6outof9casesbycrossover
destroyedin1outof9casesbycrossover
rearrange
Fig. 7: Two linked genes and their destruction probability under single-point
crossover.
7 Noise and Robustness
7.1 Introduction – Noise
In the context of optimization, three types of noise can be distinguished. The
first form is noise in the training data used as basis for learning (i) . In many
Why Is Optimization Difficult? 21
applications of machine learning or optimization where a model m for a given
system is to be learned, data samples including the input of the system and its
measured response are used for training. Some typical examples of situations
where training data is the basis for the objective function evaluation are
•the usage of global optimization for building classifiers (for example for
predicting buying behavior using data gathered in a customer survey for
training),
•the usage of simulations for determining the objective values in Genetic
Programming (here, the simulated scenarios correspond to training cases),
and
•the fitting of mathematical functions to (x, y )-data samples (with artificial
neural networks or symbolic regression, for instance).
Since no measurement device is 100% accurate and there are always random
errors, noise is present in such optimization problems.
Besides inexactnesses and fluctuations in the input data of the optimiza-
tion process, perturbations are also likely to occur during the application of
its results. This category subsumes the other two types of noise: perturbations
that may arise from inaccuracies in (ii) the process of realizing the solutions
and (iii) environmentally induced perturbations during the applications of
the products.
This issue can be illustrated using the process of developing the perfect
tire for a car as an example. As input for the optimizer, all sorts of material
coefficients and geometric constants measured from all known types of wheels
and rubber could be available. Since these constants have been measured or
calculated from measurements, they include a certain degree of noise and
imprecision (i).
The result of the optimization process will be the best tire construction
plan discovered during its course and it will likely incorporate different ma-
terials and structures. We would hope that the tires created according to
the plan will not fall apart if, accidently, an extra 0.0001% of a specific rub-
ber component is used (ii) . During the optimization process, the behavior of
many construction plans will be simulated in order to find out about their
utility. When actually manufactured, the tires should not behave unexpect-
edly when used in scenarios different from those simulated (iii) and should
instead be applicable in all driving scenarios likely to occur.
The effects of noise in optimization have been studied by various re-
searchers; Miller and Goldberg [136, 137], Lee and Wong [125], and Gurin
and Rastrigin [92] are some of them. Many global optimization algorithms
and theoretical results have been proposed which can deal with noise. Some
of them are, for instance, specialized
•Genetic Algorithms [75, 119, 188, 189, 217, 218],
•Evolution Strategies [11, 21, 96], and
•Particle Swarm Optimization [97, 161] approaches.
22 Weise, Zapf, Chiong, Nebro
7.2 The Problem: Need for Robustness
The goal of global optimization is to find the global optima of the objective
functions. While this is fully true from a theoretical point of view, it may
not suffice in practice. Optimization problems are normally used to find good
parameters or designs for components or plans to be put into action by human
beings or machines. As we have already pointed out, there will always be noise
and perturbations in practical realizations of the results of optimization.
Definition 4 (Robustness). A system in engineering or biology is robust if
it is able to function properly in the face of genetic or environmental pertur-
bations [225].
Therefore, a local optimum (or even a non-optimal element) for which
slight deviations only lead to gentle performance degenerations is usually
favored over a global optimum located in a highly rugged area of the fitness
landscape [31]. In other words, local optima in regions of the fitness landscape
with strong causality are sometimes better than global optima with weak
causality. Of course, the level of this acceptability is application-dependent.
Fig. 8 illustrates the issue of local optima which are robust vs. global optima
which are not. More examples from the real world are:
•When optimizing the control parameters of an airplane or a nuclear power
plant, the global optimum is certainly not used if a slight perturbation can
have hazardous effects on the system [218].
•Wiesmann et al [234, 235] bring up the topic of manufacturing tolerances
in multilayer optical coatings. It is no use to find optimal configurations
if they only perform optimal when manufactured to a precision which is
either impossible or too hard to achieve on a constant basis.
•The optimization of the decision process on which roads should be pre-
cautionary salted for areas with marginal winter climate is an example
of the need for dynamic robustness. The global optimum of this problem
is likely to depend on the daily (or even current) weather forecast and
may therefore be constantly changing. Handa et al [98] point out that it is
practically infeasible to let road workers follow a constantly changing plan
and circumvent this problem by incorporating multiple road temperature
settings in the objective function evaluation.
•Tsutsui et al [218, 217] found a nice analogy in nature: The phenotypic
characteristics of an individual are described by its genetic code. Dur-
ing the interpretation of this code, perturbations like abnormal tempera-
ture, nutritional imbalances, injuries, illnesses and so on may occur. If the
phenotypic features emerging under these influences have low fitness, the
organism cannot survive and procreate. Thus, even a species with good
genetic material will die out if its phenotypic features become too sensi-
tive to perturbations. Species robust against them, on the other hand, will
survive and evolve.
Why Is Optimization Difficult? 23
globaloptimum
robustlocaloptimum
f(x)
X
Fig. 8: A robust local optimum vs. a "unstable" global optimum.
7.3 Countermeasures
For the special case where the problem space corresponds to the real vectors
(X⊆ Rn ), several approaches for dealing with the problem of robustness
have been developed. Inspired by Taguchi methods6 [209], possible distur-
bances are represented by a vector δ = (δ1 , δ2 , .., δn )T , δi ∈R in the method
of Greiner [87, 88]. If the distribution and influence of the δi are known,
the objective function f (x ) : x∈X can be rewritten as ˜
f(x ,δ) [235]. In
the special case where δ is normally distributed, this can be simplified to
˜
f ( x1 + δ1 , x2 + δ2 , .., xn + δn )T . It would then make sense to sample the
probability distribution of δ a number of t times and to use the mean val-
ues of ˜
f(x ,δ) for each objective function evaluation during the optimization
process. In cases where the optimal value y of the objective function f is
known, Equation 3 can be minimized. This approach is also used in the work
of Wiesmann et al [234, 235] and basically turns the optimization algorithm
into something like a maximum likelihood estimator.
f′ (x ) = 1
t
t
X
i=1
y−˜
f(x ,δi ) 2
(3)
This method corresponds to using multiple, different training scenarios
during the objective function evaluation in situations where X 6⊆ Rn . By
adding random noise and artificial perturbations to the training cases, the
chance of obtaining robust solutions which are stable when applied or realized
under noisy conditions can be increased.
6http://en.wikipedia.org/wiki/Taguchi_methods [accessed 2008-07-19]
24 Weise, Zapf, Chiong, Nebro
8 Overfitting and Oversimplification
In all scenarios where optimizers evaluate some of the objective values of the
solution candidates by using training data, two additional phenomena with
negative influence can be observed: overfitting and oversimplification.
8.1 Overfitting
8.1.1 The Problem
Definition 5 (Overfitting). Overfitting is the emergence of an overly com-
plicated model (solution candidate) in an optimization process resulting from
the effort to provide the best results for as much of the available training data
as possible [64, 80, 190, 202].
A model (solution candidate) m∈X created with a finite set of training
data is considered to be overfitted if a less complicated, alternative model
m′ ∈Xexists which has a smaller error for the set of all possible (maybe
even infinitely many), available, or (theoretically) producible data samples.
This model m′ may, however, have a larger error in the training data.
The phenomenon of overfitting is best known and can often be encountered
in the field of artificial neural networks or in curve fitting [124, 128, 181, 191,
211]. The latter means we that have a set A of n training data samples
(xi , yi ) and want to find a function f that represents these samples as well
as possible, i.e., f (xi ) = yi ∀ ( xi , yi )∈ A.
There exists exactly one polynomial of the degree n− 1 that fits to each
such training data and goes through all its points. Hence, when only polyno-
mial regression is performed, there is exactly one perfectly fitting function of
minimal degree. Nevertheless, there will also be an infinite number of poly-
nomials with a higher degree than n− 1 that also match the sample data
perfectly. Such results would be considered as overfitted.
In Fig. 9, we have sketched this problem. The function f1 (x ) = x shown in
Fig. 9.b has been sampled three times, as sketched in Fig. 9.a. There exists
no other polynomial of a degree of two or less that fits to these samples than
f1 . Optimizers, however, could also find overfitted polynomials of a higher
degree such as f2 which also match the data, as shown in Fig. 9.c. Here, f2
plays the role of the overly complicated model m which will perform as good
as the simpler model m′ when tested with the training sets only, but will fail
to deliver good results for all other input data.
A very common cause for overfitting is noise in the sample data. As we
have already pointed out, there exists no measurement device for physical
processes which delivers perfect results without error. Surveys that represent
the opinions of people on a certain topic or randomized simulations will ex-
Why Is Optimization Difficult? 25
x
y
Fig. 9.a: Three sample
points of f1 .
x
y
m`
Fig. 9.b: m′ ≡ f1 (x ) = x.
x
y
m
Fig. 9.c: m≡ f2 (x ).
Fig. 9: Overfitting due to complexity.
hibit variations from the true interdependencies of the observed entities, too.
Hence, data samples based on measurements will always contain some noise.
In Fig. 10 we have sketched how such noise may lead to overfitted re-
sults. Fig. 10.a illustrates a simple physical process obeying some quadratic
equation. This process has been measured using some technical equipment
and the 100 noisy samples depicted in Fig. 10.b has been obtained. Fig. 10.c
shows a function resulting from an optimization that fits the data perfectly.
It could, for instance, be a polynomial of degree 99 that goes right through
all the points and thus, has an error of zero. Although being a perfect match
to the measurements, this complicated model does not accurately represent
the physical law that produced the sample data and will not deliver precise
results for new, different inputs.
m`
x
y
Fig. 10.a: The original
physical process.
x
y
Fig. 10.b: The measure-
ment/training data.
x
y
m
Fig. 10.c: The overfitted re-
sult.
Fig. 10: Fitting noise.
From the examples we can see that the major problem that results from
overfitted solutions is the loss of generality.
Definition 6 (Generality). A solution of an optimization process is general
if it is not only valid for the sample inputs a1 , a2 ,...,a nwhich were used
26 Weise, Zapf, Chiong, Nebro
for training during the optimization process, but also for different inputs
a6= ai ∀ i: 0 < i ≤ n if such inputs a exist.
8.1.2 Countermeasures
There exist multiple techniques that can be utilized in order to prevent over-
fitting to a certain degree. It is most efficient to apply multiple such techniques
together in order to achieve best results.
A very simple approach is to restrict the problem space X in a way that
only solutions up to a given maximum complexity can be found. In terms
of function fitting, this could mean limiting the maximum degree of the
polynomials to be tested. Furthermore, the functional objective functions
which solely concentrate on the error of the solution candidates should be
augmented by penalty terms and non-functional objective functions putting
pressure in the direction of small and simple models [64, 116].
Large sets of sample data, although slowing down the optimization pro-
cess, may improve the generalization capabilities of the derived solutions. If
arbitrarily many training datasets or training scenarios can be generated,
there are two approaches which work against overfitting:
1. The first method is to use a new set of (randomized) scenarios for each eval-
uation of a solution candidate. The resulting objective values may differ
largely even if the same individual is evaluated twice in a row, introducing
incoherence and ruggedness into the fitness landscape.
2. At the beginning of each iteration of the optimizer, a new set of (random-
ized) scenarios is generated which is used for all individual evaluations
during that iteration. This method leads to objective values which can be
compared without bias.
In both cases it is helpful to use more than one training sample or scenario per
evaluation and to set the resulting objective value to the average (or better
median) of the outcomes. Otherwise, the fluctuations of the objective values
between the iterations will be very large, making it hard for the optimizers
to follow a stable gradient for multiple steps.
Another simple method to prevent overfitting is to limit the runtime of the
optimizers [190]. It is commonly assumed that learning processes normally
first find relatively general solutions which subsequently begin to overfit be-
cause the noise "is learned", too.
For the same reason, some algorithms allow to decrease the rate at which
the solution candidates are modified by time. Such a decay of the learning
rate makes overfitting less likely.
If only one finite set of data samples is available for training/optimization,
it is common practice to separate it into a set of training data At and a set
of test cases Ac . During the optimization process, only the training data is
used. The resulting solutions are tested with the test cases afterwards. If their
Why Is Optimization Difficult? 27
behavior is significantly worse when applied to Ac than when applied to At ,
they are probably overfitted.
The same approach can be used to detect when the optimization process
should be stopped. The best known solution candidates can be checked with
the test cases in each iteration without influencing their objective values
which solely depend on the training data. If their performance on the test
cases begins to decrease, there are no benefits in letting the optimization
process continue any further.
8.2 Oversimplification
8.2.1 The Problem
Oversimplification (also called overgeneralization) is the opposite of overfit-
ting. Whereas overfitting denotes the emergence of overly-complicated so-
lution candidates, oversimplified solutions are not complicated enough. Al-
though they represent the training samples used during the optimization pro-
cess seemingly well, they are rough overgeneralizations which fail to provide
good results for cases not part of the training.
A common cause for oversimplification is sketched in Fig. 11: The training
sets only represent a fraction of the set of possible inputs. As this is normally
the case, one should always be aware that such an incomplete coverage may
fail to represent some of the dependencies and characteristics of the data,
which then may lead to oversimplified solutions. Another possible reason
is that ruggedness, deceptiveness, too much neutrality, or high epistasis in
the fitness landscape may lead to premature convergence and prevent the
optimizer from surpassing a certain quality of the solution candidates. It then
cannot completely adapt them even if the training data perfectly represents
the sampled process. A third cause is that a problem space which does not
include the correct solution was chosen.
Fig. 11.a shows a cubic function. Since it is a polynomial of degree three,
four sample points are needed for its unique identification. Maybe not know-
ing this, only three samples have been provided in Fig. 11.b. By doing so,
some vital characteristics of the function are lost. Fig. 11.c depicts a square
function – the polynomial of the lowest degree that fits exactly to these sam-
ples. Although it is a perfect match, this function does not touch any other
point on the original cubic curve and behaves totally differently at the lower
parameter area.
However, even if we had included point P in our training data, it would
still be possible that the optimization process would yield Fig. 11.c as a
result. Having training data that correctly represents the sampled system
does not mean that the optimizer is able to find a correct solution with
perfect fitness – the other, previously discussed problematic phenomena can
28 Weise, Zapf, Chiong, Nebro
prevent it from doing so. Furthermore, if it was not known that the system
which was to be modeled by the optimization process can best be represented
by a polynomial of the third degree, one could have limited the problem space
Xto polynomials of degree two and less. Then, the result would likely again
be something like Fig. 11.c, regardless of how many training samples are used.
x
y
P
Fig. 11.a: The "real sys-
tem" and the points de-
scribing it.
x
y
Fig. 11.b: The sampled
training data.
x
y
Fig. 11.c: The oversimpli-
fied result.
Fig. 11: Oversimplification.
8.2.2 Countermeasures
In order to counter oversimplification, its causes have to be mitigated. Gen-
erally, it is not possible to have training scenarios which cover the complete
input space of the evolved programs. By using multiple scenarios for each
individual evaluation, the chance of missing important aspects is decreased.
These scenarios can be replaced with new, randomly created ones in each
generation, which will decrease this chance even more. The problem space,
i.e., the representation of the solution candidates, should further be chosen
in a way which allows constructing a correct solution to the problem defined.
Then again, releasing too many constraints on the solution structure increases
the risk of overfitting and thus, careful proceeding is recommended.
9 Multi-Objective Optimization
9.1 Introduction
Many optimization problems in the real world have k , possibly contradictory
objectives fi which must be optimized simultaneously. Furthermore, the so-
lutions must satisfy m inequality constraints g and p equality constraints h.
Why Is Optimization Difficult? 29
A solution candidate x is feasible , if and only if gi ( x )≥ 0 ∀i = 1, 2, .., m and
hi ( x) = 0 ∀ i = 1 , 2 , .., p holds. A multi-objective optimization problem (MOP)
can then be formally defined as follows:
Definition 7 (MOP). Find a solution candidate x⋆ in X which minimizes
(or maximizes) the vector function f ( x⋆ ) = (fi (x⋆ ) , f2 ( x⋆ ) , .., fk (x⋆ ))T and is
feasible, (i.e., satisfies the m inequality constraints gi ( x⋆ )≥ 0 ∀i = 1, 2, .., m,
the p equality constraints hi (x⋆ ) = 0 ∀i = 1, 2, .., p).
As in single-objective optimization, nature-inspired algorithms are popu-
lar techniques to solve such problems. The fact that there are two or more
objective functions implies additional difficulties. Due to the contradictory
feature of the functions in a MOP and the fact that there exists no total
order in Rn for n > 1, the notions of "better than" and "optimum" have to
be redefined. When comparing any two solutions x1 and x2 , solution x1 can
have a better value in objective fi , i.e., fi (x1 ) < fi ( x2 ), while solution x2 can
have a better value in objective fj . The concepts commonly used here are
Pareto dominance and Pareto optimality.
Definition 8 (Pareto Dominance). In the context of multi-objective
global optimization, a solution candidate x1 is said to dominate another so-
lution candidate x2 (denoted by x1 4x2 ) if and only if f (x1 ) is partially less
than f (x2 ), i.e., ∀i ∈ { 1, .., k} fi (x1 )≤ fi ( x2 ) ∧ ∃j ∈ { 1, .., k} : fj ( x1 ) <
fj (x2 ).
f1
f2
8
0
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8 9 10 11
9
10
16
7
2
5
3
4
ParetoFront
1 2
⋠
12 ⋠
15
´
6
1´
1⋠
´
⋠
10
9
9
7
10
8
Fig. 12: Some examples for the dominance relation.
The dominance notion allows us to assume that if solution x1 dominates
solution x2 , then x1 is preferable to x2 . If both solution are non-dominated
(such as candidate ① and ② in Fig. 12), some additional criteria have to be
used to choose one of them.
30 Weise, Zapf, Chiong, Nebro
Definition 9 (Pareto Optimality). A feasible point x⋆ ∈X is Pareto-
optimal if and only if there is no feasible xb ∈X with xb 4x⋆ .
This definition states that x⋆ is Pareto-optimal if there is no other feasible
solution xb which would improve some criterion without causing a simul-
taneous worsening in at least one other criterion. The solution to a MOP,
considering Pareto optimality, is the set of feasible, non-dominated solutions
which is known as Pareto-optimal set:
Definition 10 (Pareto-Optimal Set). For a given MOP f( x), the Pareto
optimal set is defined as P⋆ = {x⋆ ∈ X|¬∃x ∈X :x4 x⋆ } .
When the solutions in the Pareto-optimal set are plotted in the objective
space (as sketched in Fig. 12), they are collectively known as the Pareto front:
Definition 11 (Pareto Front). For a given MOP f (x ) and its Pareto-
optimal set P⋆ , the Pareto front is defined as PF ⋆ = {f (x ) |x ∈ P ⋆ }.
Obtaining the Pareto front of a MOP is the main goal of multi-objective
optimization. In a real scenario, the solutions in the Pareto front are sent
to an expert in the MOP, the decision maker, who will be responsible for
choosing the best tradeoff solution among all of them. Fig. 13 depicts the
Pareto front of a bi-objective MOP. In a real problem example, f1 could
represent the time required by a car to cover a given distance, while f2 could
be the fuel consumption.
f1
f2
Fig. 13: Example of Pareto front of a bi-objective MOP.
The Pareto front of a MOP can contain a large (possibly infinite) number
of points. Usually, the goal of optimization is to obtain a fixed-size set of
solutions called the Pareto front approximation set. Population-based algo-
rithms, such as Genetic Algorithms, are very popular to solve MOPs because
they can provide an approximation set in a single run.
Why Is Optimization Difficult? 31
Given that the goal is to find a Pareto front approximation set, two is-
sues arise. First, the optimization process should converge to the true Pareto
front and return solutions as close to it as possible. Second, they should be
uniformly spread along this front.
f1
f2
trueParetofront
frontreturned
bytheoptimizer
Fig. 14.a: Bad Convergence and Good
Spread
f1
f2
Fig. 14.b: Good Convergence and Bad
Spread
f1
f2
Fig. 14.c: Good Convergence and Spread
Fig. 14: Pareto front approximation sets.
Let us examine the three fronts included in Fig. 14. The first picture
(Fig. 14.a) shows an approximation set having a very good spread7 of solu-
tions, but the points are far away from the true Pareto front. Such results
are not attractive because they do not provide Pareto-optimal solutions. The
second example (Fig. 14.b) contains a set of solutions which are very close to
the true Pareto front but cover it only partially, so the decision maker could
lose important trade-off solutions. Finally, the front depicted in Fig. 14.c has
the two desirable properties of good convergence and spread.
7In MO optimization, this property is usually called diversity. In order to avoid confusion
with the (related) diversity property from Section 2.3, we here use the term spread instead.
32 Weise, Zapf, Chiong, Nebro
9.2 The Problem
Features such as multi-modality, deceptiveness, or epistasis found in single-
objective optimization also affect MOPs, making them more difficult to solve.
However, there are some characteristics that are particular to MOPs. Here
we comment on two of them: geometry and dimensionality.
The Pareto front in Fig. 13 has a convex geometry, but there are other
different shapes as well. In Fig. 15 we show some examples, including non-
convex (concave), disconnected, linear, and non-uniformly distributed Pareto
fronts. Besides Pareto optimization, there is a wide variety of other concepts
for defining what optima are in the presence of multiple objective functions
[45]. The simplest approach is maybe to use a weighted sum of all objective
values and set v (x ) = P k
i=1 f i (x). Then the optima would be the element(s)
x⋆ with ¬∃ x∈X: v (x) < v (x⋆ ). However, an optimization process driven
by such a linear aggregating function will not find portions of Pareto fronts
with non-convex geometry as shown by Das and Dennis [50].
f1
f2
Fig. 15.a: Non-Convex (Concave)
f1
f2
Fig. 15.b: Disconnected
f1
f2
Fig. 15.c: linear
f1
f2
Fig. 15.d: Non-Uniformly Distributed
Fig. 15: Examples of Pareto fronts.
Why Is Optimization Difficult? 33
Many studies in the literature consider mainly bi-objective MOPs. As a
consequence, many algorithms are designed to deal with that kind of prob-
lems. However, MOPs having a higher number of objective functions are
common in practice, leading to the so-called many-objective optimization
[165], which is currently a hot research topic. Most of the optimization al-
gorithms applied today utilize the Pareto dominance relation. When the di-
mension of the MOPs increases, the majority of solution candidates are non-
dominated. As a consequence, traditional nature-inspired algorithms have to
be redesigned.
9.3 Countermeasures
In order to obtain an accurate approximation to the true Pareto front, many
nature-inspired multi-objective algorithms apply a fitness assignment scheme
based on the concept of Pareto dominance, as commented before. For exam-
ple, NSGA-II [61, 62], the most well-known multi-objective technique, assigns
to each solution a rank depending on the number of solutions dominating it.
Thus, solutions with rank 1 are non-dominated, solutions with rank 2 are
dominated by one solution, and so on. Other algorithms, such as SPEA2
[247, 248] introduce the concept of strength, which is similar to the ranking
but also considers the number of dominated solutions.
While the use of Pareto-based ranking methods allows the techniques to
search in the direction of finding approximations with good convergence, addi-
tional strategies are needed to promote spread. The most commonly adopted
approach is to include a kind of density estimator in order to select those
solutions which are in the less crowded regions of the objective space. Thus,
NSGA-II employs the crowding distance [61] and SPEA2 the distance to the
k-nearest neighbor [62].
9.4 Constraint Handling
How the constraints mentioned in Definition 7 are handled is a whole research
area in itself with roots in single-objective optimization. Maybe one of the
most popular approach for dealing with constraints goes back to Courant [48]
who introduced the idea of penalty functions [73, 44, 201] in 1943: Consider,
for instance, the term f′ ( x ) = f (x ) + v [h (x)]2 where f is the original objec-
tive function, h is an equality constraint, and v > 0. If f′ is minimized, an
infeasible individual will always have a worse fitness than a feasible one with
the same objective values.
Besides such static penalty functions, dynamic terms incorporating the
generation counter [111, 157] or adaptive approaches utilizing additional
34 Weise, Zapf, Chiong, Nebro
population statistics [95, 199] have been proposed. Rigorous discussions on
penalty functions have been contributed by Fiacco and McCormick [73] and
Smith and Coit [201].
During the last fifteen years, many approaches have been developed
which incorporate constraint handling and multi-objectivity. Instead of using
penalty terms, Pareto ranking can also be extended by additionally com-
paring individuals according to their feasibility, for instance. Examples for
this approach are the Method of Inequalities (MOI) Zakian [245] as used
by Pohlheim [164] and the Goal Attainment method defined in [76]. Deb
[56, 58] even suggested to simply turn constraints into objective functions in
his MOEA version of Goal Programming.
10 Dynamically Changing Fitness Landscape
It should also be mentioned that there exist problems with dynamically
changing fitness landscapes [33, 32, 36, 147, 173]. The task of an optimization
algorithm is, then, to provide solution candidates with momentarily optimal
objective values for each point in time. Here we have the problem that an
optimum in iteration t will possibly not be an optimum in iteration t + 1
anymore.
The moving peaks benchmarks by Branke [33, 32] and Morrison and De
Jong [147] are good examples for dynamically changing fitness landscapes.
Such problems with dynamic characteristics can, for example, be tackled with
special forms [244] of
•Evolutionary Algorithms [9, 34, 35, 145, 146, 216, 236],
•Genetic Algorithms [83, 119, 142, 143, 144],
•Particle Swarm Optimization [23, 42, 43, 126, 160],
•Differential Evolution [135, 243], and
•Ant Colony Optimization [90, 91]
11 The No Free Lunch Theorem
By now, we know the most important problems that can be encountered when
applying an optimization algorithm to a given problem. Furthermore, we
have seen that it is arguable what actually an optimum is if multiple criteria
are optimized at once. The fact that there is most likely no optimization
method that can outperform all others on all problems can, thus, easily be
accepted. Instead, there exist a variety of optimization methods specialized
in solving different types of problems. There are also algorithms which deliver
good results for many different problem classes, but may be outperformed by
highly specialized methods in each of them.
Why Is Optimization Difficult? 35
These facts have been formalized by Wolpert and Macready [241, 242]
in their No Free Lunch Theorems (NFL) for search and optimization algo-
rithms. Wolpert and Macready [242] focus on single-objective optimization
and prove that the sum of the values of any performance measure (such as the
objective value of the best solution candidate discovered until a time step m)
over all possible objective functions f is always identical for all optimization
algorithms.
From this theorem, we can immediately follow that, in order to outper-
form the optimization method a1 in one optimization problem, the algorithm
a2 will necessarily perform worse in another. Fig. 16 visualizes this issue.
The higher the value of the performance measure illustrated there, the faster
will the corresponding problem be solved. The figure shows that general op-
timization approaches (like Evolutionary Algorithms) can solve a variety of
problem classes with reasonable performance. Hill Climbing approaches, for
instance, will be much faster than Evolutionary Algorithms if the objective
functions are steady and monotonous, that is, in a smaller set of optimization
tasks. Greedy search methods will perform fast on all problems with matroid
structure. Evolutionary Algorithms will most often still be able to solve these
problems, it just takes them longer to do so. The performance of Hill Climb-
ing and greedy approaches degenerates in other classes of optimization tasks
as a trade-off for their high utility in their "area of expertise".
allpossibleoptimizationproblems
performance
randomwalkorexhaustiveenumerationor...
generaloptimizationalgorithm-anEA,forinstance
specializedoptimizationalgorithm1;ahillclimber,forinstance
specializedoptimizationalgorithm2;adepth-firstsearch,forinstance
verycrudesketch
Fig. 16: A visualization of the No Free Lunch Theorem.
36 Weise, Zapf, Chiong, Nebro
One interpretation of the No Free Lunch Theorem is that it is impossi-
ble for any optimization algorithm to outperform random walks or exhaus-
tive enumerations on all possible problems. For every problem where a given
method leads to good results, we can construct a problem where the same
method has exactly the opposite effect (see Section 4). As a matter of fact,
doing so is even a common practice to find weaknesses of optimization algo-
rithms and to compare them with each other.
Another interpretation is that every useful optimization algorithm utilizes
some form of problem-specific knowledge. Radcliffe [167] states that without
such knowledge, search algorithms cannot exceed the performance of simple
enumerations. Incorporating knowledge starts with relying on simple assump-
tions like "if x is a good solution candidate, than we can expect other good
solution candidates in its vicinity", i.e., strong causality. The more (correct)
problem specific knowledge is integrated (correctly) into the algorithm struc-
ture, the better will the algorithm perform. On the other hand, knowledge
correct for one class of problems is, quite possibly, misleading for another
class. In reality, we use optimizers to solve a given set of problems and are
not interested in their performance when (wrongly) applied to other classes.
Today, there exists a wide range of work on No Free Lunch Theorems
for many different aspects of machine learning. The website http://www.
no-free-lunch.org/8 gives a good overview about them. Further summaries
and extensions have been provided by K¨oppen et al [121] and Igel and Tou-
ssaint [108, 109]. Radcliffe and Surry [168] discuss the NFL in the context of
Evolutionary Algorithms and the representations used as search spaces. The
No Free Lunch Theorem is furthermore closely related to the Ugly Duckling
Theorem proposed by Watanabe [228] for classification and pattern recogni-
tion.
12 Concluding Remarks
The subject of this introductory chapter was the question about what makes
optimization problems hard, especially for metaheuristic approaches. We have
discussed numerous different phenomena which can affect the optimization
process and lead to disappointing results. If an optimization process has con-
verged prematurely, it has been trapped in a non-optimal region of the search
space from which it cannot "escape" anymore (Section 2). Ruggedness (Sec-
tion 3) and deceptiveness (Section 4) in the fitness landscape, often caused
by epistatic effects (Section 6), can misguide the search into such a region.
Neutrality and redundancy (Section 5) can either slow down optimization
because the application of the search information does not lead to a gain in
information or may also contribute positively by creating neutral networks
8accessed: 2008-03-28
Why Is Optimization Difficult? 37
from which the search space can be explored and local optima can be escaped
from. The solutions that are derived, even in the presence of noise, should
be robust (Section 7). Also, they should neither be too general (oversimpli-
fication, Section 8.2) nor too specifically aligned only to the training data
(overfitting, Section 8.1). Furthermore, many practical problems are multi-
objective, i.e., involve the optimization of more than one criterion at once
(Section 9), or concern objectives which may change over time (Section 10).
In the previous section, we discussed the No Free Lunch Theorem and
argued that it is not possible to develop the one optimization algorithm, the
problem-solving machine which can provide us with near-optimal solutions
in short time for every possible optimization task. This must sound very
depressing for everybody new to this subject.
Evolutionary
Algorithms
ACO
PSO
Extremal
Optimiz.
Simulated
Annealing
EDA
Tabu
Search
Branch&
Bound Dynamic
Program.
A«
Search
IDDFS
Hill
Climbing
Memetic
Algorithms
Downhill
Simplex
GA,GP,ES,
DE,EP,...
LCS
RFD
Random
Optimiz.
Fig. 17: The puzzle of optimization algorithms.
Actually, quite the opposite is the case, at least from the point of view of a
researcher. The No Free Lunch Theorem means that there will always be new
ideas, new approaches which will lead to better optimization algorithms to
solve a given problem. Instead of being doomed to obsolescence, it is far more
likely that most of the currently known optimization methods have at least
one niche, one area where they are excellent. It also means that it is very likely
that the "puzzle of optimization algorithms" will never be completed. There
will always be a chance that an inspiring moment, an observation in nature,
for instance, may lead to the invention of a new optimization algorithm which
performs better in some problem areas than all currently known ones.
Acknowledgements We gratefully acknowledge comments on early drafts of this chap-
ter by Peter J Bentley and Patrick Siarry. The last author acknowledges support from
38 Weise, Zapf, Chiong, Nebro
the "Consejer´ıa de Innovaci´on, Ciencia y Empresa", Junta de Andaluc´ıa under contract
P07-TIC-03044 DIRICOM project, http://diricom.lcc.uma.es.
References
1. Ackley DH (1987) A connectionist machine for genetic hillclimbing, The Springer
International Series in Engineering and Computer Science, vol 28. Kluwer Academic
Publishers
2. Altenberg L (1994) The schema theorem and price's theorem. In: Foundations of
Genetic Algorithms 3, pp 23–49
3. Altenberg L (1995) Genome growth and the evolution of the genotype-phenotype
map. In: Evolution and Biocomputation – Computational Models of Evolution,
Springer-Verlag, pp 205–259
4. Altenberg L (1996) Nk fitness landscapes. In: Handbook of Evolutionary Computa-
tion, Oxford University Press, chap B2.7.2
5. Altenberg L (1997) Fitness distance correlation analysis: An instructive counterexam-
ple. In: Proceedings of the International Conference on Genetic Algorithms, ICGA,
pp 57–64
6. Amitrano C, Peliti L, Saber M (1989) Population dynamics in a spin-glass
model of chemical evolution. Journal of Molecular Evolution 29(6):513–525,
doi:10.1007/BF02602923
7. Amor HB, Rettinger A (2005) Intelligent exploration for genetic algorithms: Us-
ing self-organizing maps in evolutionary computation. In: Genetic and Evolutionary
Computation Conference, GECCO, pp 1531–1538, doi:10.1145/1068009.1068250
8. Angeline PJ, Pollack J (1993) Evolutionary module acquisition. In: The Second An-
nual Conference on Evolutionary Programming, Evolutionary Programming Society,
pp 154–163
9. Arag´on VS, Esquivel SC (2004) An evolutionary algorithm to track changes of op-
timum value locations in dynamic environments. Journal of Computer Science &
Technology (JCS&T) 4(3):127–133, invited paper
10. Bachmann PGH (1894) Die Analytische Zahlentheorie / Dargestellt von Paul Bach-
mann, Zahlentheorie: Versuch einer Gesamtdarstellung dieser Wissenschaft in ihren
Haupttheilen, vol Zweiter Theil. B. G. Teubner, Leipzig, Germany
11. B¨ack T, Hammel U (1994) Evolution strategies applied to perturbed ob jective func-
tions. In: Proceedings of the IEEE Congress on Evolutionary Computation, CEC,
vol 1, pp 40–45, doi:10.1109/ICEC.1994.350045
12. Bak P, Sneppen K (1993) Punctuated equilibrium and criticality in a simple model of
evolution. Physical Review Letters 71:4083–4086, doi:10.1103/PhysRevLett.71.4083
13. Baldwin JM (1896) A new factor in evolution. The American Naturalist 30:441–451
14. Barnett L (1997) Tangled webs: Evolutionary dynamics on fitness landscapes with
neutrality. Master's thesis, School of Cognitive Science, University of East Sussex,
Brighton, UK
15. Barnett L (1998) Ruggedness and neutrality – the nkp family of fitness landscapes.
In: Artificial Life VI: Proceedings of the sixth international conference on Artificial
life, pp 18–27
16. Bateson W (1909) Mendel's Principles of Heredity. Cambridge University Press
17. Beaudoin W, Verel S, Collard P, Escazut C (2006) Deceptiveness and neutrality the nd
family of fitness landscapes. In: Genetic and Evolutionary Computation Conference,
GECCO, pp 507–514, doi:10.1145/1143997.1144091
Why Is Optimization Difficult? 39
18. Beerenwinkel N, Pachter L, Sturmfels B (2006) Epistasis and shapes of fitness land-
scapes. Eprint arXiv:q-bio/0603034 (Quantitative Biology, Populations and Evolu-
tion). http://arxiv.org/abs/q-bio.PE/0603034 [accessed 2007-08-05]
19. Bergman A, Feldman MW (1992) Recombination dynamics and the fitness landscape.
Physica D: Nonlinear Phenomena 56:57–67, doi:10.1016/0167-2789(92)90050-W
20. Bethke AD (1980) Genetic algorithms as function optimizers. PhD thesis, University
of Michigan, Ann Arbor, MI, USA
21. Beyer HG (1993) Toward a theory of evolution strategies: Some asymptotical results
from the (1, + λ )-theory. Evolutionary Computation 1(2 (Summer)):165–188
22. Beyer HG (1994) Toward a theory of evolution strategies: The (µ, λ )-theory. Evolu-
tionary Computation 2(4):381–407
23. Blackwell T (2007) Particle swarm optimization in dynamic environments. In: Evo-
lutionary Computation in Dynamic and Uncertain Environments, Springer, chap 2,
pp 29–52
24. Bledsoe WW, Browning I (1959) Pattern recognition and reading by machine. In: Pro-
ceedings of the Eastern Joint Computer Conference (EJCC) – Papers and Discussions
Presented at the Joint IRE - AIEE - ACM Computer Conference, pp 225–232
25. Blum C, Roli A (2003) Metaheuristics in combinatorial optimization: Overview and
conceptual comparison. ACM Computing Surveys 35(3):268–308
26. Bonner JT (1974) On Development: The Biology of Form, new ed edn. Common-
wealth Fund Publications, Harvard University Press
27. Bornberg-Bauer E, Chan HS (1999) Modeling evolutionary landscapes: Mutational
stability, topology, and superfunnels in sequence space. Proceedings of the Na-
tional Academy of Science of the United States of Americs (PNAS) – Biophysics
96(19):10,689–10,694
28. Bosman PAN, Thierens D (2002) Multi-objective optimization with diversity preserv-
ing mixture-based iterated density estimation evolutionary algorithms. International
Journal Approximate Reasoning 31(3):259–289, doi:10.1016/S0888-613X(02)00090-7
29. Bosman PAN, Thierens D (2002) A thorough documentation of obtained results
on real-valued continuous and combinatorial multi-objective optimization problems
using diversity preserving mixture-based iterated density estimation evolutionary al-
gorithms. Tech. Rep. UU-CS-2002-052, Institute of Information and Computing Sci-
ences, Utrecht University, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
30. Brameier MF, Banzhaf W (2002) Explicit control of diversity and effective varia-
tion distance in linear genetic programming. In: EuroGP'02: Proceedings of the 5th
European Conference on Genetic Programming, pp 37–49
31. Branke J (1998) Creating robust solutions by means of evolutionary algorithms. In:
Proceedings of the International Conference on Parallel Problem Solving from Nature,
PPSN, pp 119–128, doi:10.1007/BFb0056855
32. Branke J (1999) Memory enhanced evolutionary algorithms for changing optimization
problems. In: Proceedings of the IEEE Congress on Evolutionary Computation, CEC,
vol 3, pp 1875–1882, doi:10.1109/CEC.1999.785502
33. Branke J (1999) The moving peaks benchmark. Tech. rep., Institute AIFB, Univer-
sity of Karlsruhe, Germany, http://www.aifb.uni- karlsruhe.de/~jbr/MovPeaks/
[accessed 2007-08-19]. Presented in [32]
34. Branke J (2000) Evolutionary optimization in dynamic environments. PhD thesis,
Universit¨at Karlsruhe (TH), Fakult¨at f¨ur Wirtschaftswissenschaften
35. Branke J (2001) Evolutionary Optimization in Dynamic Environments, Genetic Al-
gorithms and Evolutionary Computation, vol 3. Kluwer Academic Publishers
36. Branke J, Saliho˘glu E, Uyar S¸ (2005) Towards an analysis of dynamic environments.
In: Genetic and Evolutionary Computation Conference, GECCO, pp 1433–1440
37. Bremermann HJ (1962) Optimization through evolution and recombination. Self-
Organizing systems pp 93–10
40 Weise, Zapf, Chiong, Nebro
38. Burke EK, Gustafson SM, Kendall G (2002) Survey and analysis of diversity mea-
sures in genetic programming. In: Genetic and Evolutionary Computation Confer-
ence, GECCO, pp 716–723
39. Burke EK, Gustafson SM, Kendall G, Krasnogor N (2003) Is increasing diversity in
genetic programming beneficial? an analysis of the effects on fitness. In: Proceedings
of the IEEE Congress on Evolutionary Computation, CEC, pp 1398–1405
40. Burke EK, Gustafson SM, Kendall G (2004) Diversity in genetic programming: An
analysis of measures and correlation with fitness. IEEE Transactions on Evolutionary
Computation 8(1):47–62, doi:10.1109/TEVC.2003.819263
41. Cant´u-Paz E, Pelikan M, Goldberg DE (2000) Linkage problem, distribu-
tion estimation, and bayesian networks. Evolutionary Computation 8(3):311–340,
doi:10.1162/106365600750078808
42. Carlisle AJ (2002) Applying the particle swarm optimizer to non-stationary environ-
ments. PhD thesis, Graduate Faculty of Auburn University
43. Carlisle AJ, Dozier GV (2002) Tracking changing extrema with adaptive particle
swarm optimizer. In: Proceedings of the 5th Biannual World Automation Congress,
WAC 2002, vol 13, pp 265–270, doi:10.1109/WAC.2002.1049555
44. Carroll CW (1959) An operations research approach to the economic optimization of a
kraft pulping process. PhD thesis, Institute of Paper Chemistry, Appleton, Wisconsin,
USA
45. Ceollo Coello CA, Lamont GB, van Veldhuizen DA (1st ed: 2002, 2nd ed:
2007) Evolutionary Algorithms for Solving Multi-Objective Problems, Genetic
and Evolutionary Computation, vol 5. Kluwer Academic Publishers / Springer,
doi:10.1007/978-0-387-36797-2
46. ping Chen Y (2006) Extending the Scalability of Linkage Learning Genetic Algo-
rithms – Theory & Practice, Studies in Fuzziness and Soft Computing, vol 190.
Springer, doi:10.1007/b102053
47. Cohoon JP, Hegde SU, Martin WN, Richards D (1987) Punctuated equilibria: a
parallel genetic algorithm. In: Proceedings of the Second International Conference on
Genetic algorithms and their Application, pp 148–154
48. Courant R (1943) Variational methods for the solution of problems of equilib-
rium and vibrations. Bulletin of the American Mathematical Society 49(1):1–23,
doi:10.1090/S0002-9904-1943-07818-4
49. Cousins SH (1991) Species diversity measurement: Choosing the right index. Trends
in Ecology and Evolution (TREE) 6(6):190–192, doi:10.1016/0169-5347(91)90212-G
50. Das I, Dennis JE (1997) A closer look at drawbacks of minimizing weighted sums of
objectives for pareto set generation in multicriteria optimization problems. Structural
optimization 14(1):63–69, doi:10.1007/BF01197559
51. Davidor Y (1990) Epistasis variance: A viewpoint on GA-hardness. In: Proceedings
of the First Workshop on Foundations of Genetic Algorithms, pp 23–35
52. Dawkins R (1987) The evolution of evolvability. In: ALIFE – Artificial Life: Proceed-
ings of the Interdisciplinary Workshop on the Synthesis and Simulation of Living
Systems, pp 201–220
53. de Jong ED, Watson RA, Pollack JB (2001) Reducing bloat and promoting diversity
using multi-objective methods. In: Genetic and Evolutionary Computation Confer-
ence, GECCO, pp 11–18
54. de Lamarck JBPAdC (1809) Philosophie zoologique – ou Exposition des con-
sid´erations relatives `a l'histoire naturelle des Animaux. Dentu / G. Bailli`ere, Paris,
France / Harvard University
55. Deb K (1989) Genetic algorithms in multimodal function optimization. Master's the-
sis, The Clearinghouse for Genetic algorithms, University of Alabama, Tuscaloosa,
tCGA Report No. 89002
Why Is Optimization Difficult? 41
56. Deb K (1999) Solving goal programming problems using multi-objective genetic algo-
rithms. In: Proceedings of the IEEE Congress on Evolutionary Computation, CEC,
pp 77–84, doi:10.1109/CEC.1999.781910
57. Deb K (2001) Genetic algorithms for optimization. KanGAL Report 2001002, Kanpur
Genetic Algorithms Laboratory (KanGAL), Kanpur, PIN 208 016, India
58. Deb K (2001) Nonlinear goal programming using multi-objective genetic algorithms.
Journal of the Operational Research Society 52(3):291–302
59. Deb K, Goldberg DE (1993) Analyzing deception in trap functions. In: Foundations
of Genetic Algorithms 2, pp 93–108
60. Deb K, Goldberg DE (1994) Sufficient conditions for deceptive and easy binary func-
tions. Annals of Mathematics and Artificial Intelligence 10(4):385–408
61. Deb K, Agrawal S, Pratab A, Meyarivan T (2000) A fast elitist non-dominated sorting
genetic algorithm for multi-objective optimization: NSGA-II. In: Proceedings of the
International Conference on Parallel Problem Solving from Nature, PPSN, pp 849–
858, also: KanGAL Report No. 200001
62. Deb K, Pratab A, Agrawal S, Meyarivan T (2002) A fast and elitist multiobjec-
tive genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation
6(2):182–197, doi:10.1109/4235.996017
63. Deb K, Sinha A, Kukkonen S (2006) Multi-objective test problems, linkages, and
evolutionary methodologies. In: Genetic and Evolutionary Computation Conference,
GECCO, ACM, New York, NY, USA, pp 1141–1148
64. Dietterich T (1995) Overfitting and undercomputing in machine learning. ACM Com-
puting Surveys (CSUR) 27(3):326–327, doi:10.1145/212094.212114
65. Droste S, Wiesmann D (1998) On representation and genetic operators in evolution-
ary algorithms. Tech. Rep. CI–41/98, Fachbereich Informatik, Universit¨at Dortmund
66. Eiben ´
AE, Schippers CA (1998) On evolutionary exploration and exploitation. Fun-
damenta Informaticae 35(1-4):35–50
67. Eldredge N, Gould SJ (1972) Punctuated equilibria: an alternative to phyletic grad-
ualism. In: Schopf TJM (ed) Models in Paleobiology, Freeman, Cooper & Co / Dou-
bleDay, chap 5, pp 82–115
68. Eldredge N, Gould SJ (1977) Punctuated equilibria: The tempo and mode of evolution
reconsidered. Paleobiology 3(2):115–151
69. Eshelman LJ, Schaffer JD (1991) Preventing premature convergence in genetic al-
gorithms by preventing incest. In: Proceedings of the International Conference on
Genetic Algorithms, ICGA, pp 115–122
70. Eshelman LJ, Caruana RA, Schaffer JD (1989) Biases in the crossover landscape. In:
Proceedings of the third international conference on Genetic algorithms, pp 10–19
71. Feo TA, Resende MGC (1995) Greedy randomized adaptive search procedures. Jour-
nal of Global Optimization 6(2):109–133, doi:10.1007/BF01096763
72. Festa P, Resende MG (2004) An annotated bibliography of grasp. AT&T Labs Re-
search Technical Report TD-5WYSEW, AT&T Labs
73. Fiacco AV, McCormick GP (1968) Nonlinear Programming: Sequential Uncon-
strained Minimization Techniques. John Wiley & Sons Inc.
74. Fisher SRA (1918) The correlations between relatives on the supposition of mendelian
inheritance. Philosophical Transactions of the Royal Society of Edinburgh 52:399–433
75. Fitzpatrick JM, Grefenstette JJ (1988) Genetic algorithms in noisy environments.
Machine Learning 3(2–3):101–120, doi:10.1007/BF00113893
76. Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization:
Formulation, discussion and generalization. In: Proceedings of the 5th International
Conference on Genetic Algorithms, pp 416–423
77. Forst CV, Reidys C, Weber J (1995) Evolutionary dynamics and optimization: Neu-
tral networks as model-landscapes for RNA secondary-structure folding-landscapes.
In: European Conference on Artificial Life, pp 128–147
42 Weise, Zapf, Chiong, Nebro
78. Friedberg RM (1958) A learning machine: Part i. IBM Journal of Research and De-
velopment 2:2–13
79. Garey MR, Johnson DS (1979) Computers and Intractability: A Guide to the Theory
of NP-Completeness. Series of Books in the Mathematical Sciences, W. H. Freeman
& Co., New York, USA
80. Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance
dilemma. Neural Computation 4(1):1–58
81. Glover F (1990) Tabu search – part ii. Operations Research Society of America
(ORSA) Journal on Computing 2(1):190–206
82. Glover F, Taillard ´
ED, de Werra D (1993) A user's guide to tabu search. Annals of
Operations Research 41(1):3–28, doi:10.1007/BF02078647
83. Gobb HG, Grefenstette JJ (1993) Genetic algorithms for tracking changing environ-
ments. In: Proceedings of the International Conference on Genetic Algorithms, ICGA,
pp 523–529
84. Goldberg DE (1989) Genetic Algorithms in Search, Optimization, and Machine
Learning. Addison-Wesley Longman Publishing Co.
85. Goldberg DE, Richardson J (1987) Genetic algorithms with sharing for multimodal
function optimization. In: Proceedings of the Second International Conference on
Genetic algorithms and their Application, pp 41–49
86. Goldberg DE, Deb K, Korb B (1989) Messy genetic algorithms: motivation, analysis,
and first results. Complex Systems 3:493–530
87. Greiner H (1994) Robust filter design by stochastic optimization. Proceed-
ings of SPIE (The International Society for Optical Engineering) 2253:150–161,
doi:10.1117/12.192107
88. Greiner H (1996) Robust optical coating design with evolutionary strategies. Applied
Optics 35:5477–5483
89. Gruau F, Whitley LD (1993) Adding learning to the cellular development of neural
networks: Evolution and the baldwin effect. Evolutionary Computation 1(3):213–233,
doi:10.1162/evco.1993.1.3.213
90. Guntsch M, Middendorf M (2001) Pheromone modification strategies for ant algo-
rithms applied to dynamic tsp. In: Proceedings of the EvoWorkshops on Applications
of Evolutionary Computing, pp 213–222
91. Guntsch M, Middendorf M, Schmeck H (2001) An ant colony optimization approach
to dynamic TSP. In: Genetic and Evolutionary Computation Conference, GECCO,
pp 860–867
92. Gurin LS, Rastrigin LA (1965) Convergence of the random search method in the
presence of noise. Automation and Remote Control 26:1505–1511
93. Gustafson SM (2004) An analysis of diversity in genetic programming. PhD thesis,
University of Nottingham, School of Computer Science & IT
94. Gustafson SM, Ek´art A, Burke EK, Kendall G (2004) Problem difficulty and code
growth in genetic programming. Genetic Programming and Evolvable Machines
5(3):271–290, doi:10.1023/B:GENP.0000030194.98244.e3
95. Hadj-Alouane AB, Bean JC (1992) A genetic algorithm for the multiple-choice integer
program. Tech. Rep. 92-50, Department of Industrial and Operations Engineering,
The University of Michigan, Ann Arbour, MI 48109-2117, USA
96. Hammel U, B¨ack T (1994) Evolution strategies on noisy functions: How to improve
convergence properties. In: Proceedings of the International Conference on Parallel
Problem Solving from Nature, PPSN, pp 159–168
97. Han L, He X (2007) A novel opposition-based particle swarm optimization for noisy
problems. In: ICNC'07: Proceedings of the Third International Conference on Natural
Computation, vol 3, pp 624–629, doi:10.1109/ICNC.2007.119
98. Handa H, Lin D, Chapman L, Yao X (2006) Robust solution of salting route op-
timisation using evolutionary algorithms. In: Proceedings of the IEEE Congress on
Evolutionary Computation, CEC, pp 3098–3105, doi:10.1109/CEC.2006.1688701
Why Is Optimization Difficult? 43
99. Harik GR (1997) Learning gene linkage to efficiently solve problems of bounded dif-
ficulty using genetic algorithms. PhD thesis, University of Michigan, Ann Arbor
100. Hinton GE, Nowlan SJ (1987) How learning can guide evolution. Complex Systems
1:495–502
101. Hinton GE, Nowlan SJ (1996) How learning can guide evolution. In: Adaptive in-
dividuals in evolving populations: models and algorithms, Addison-Wesley Longman
Publishing Co., Inc., pp 447–454
102. Holland JH (1975) Adaptation in Natural and Artificial Systems: An Introductory
Analysis with Applications to Biology, Control, and Artificial Intelligence. The Uni-
versity of Michigan Press, Ann Arbor, reprinted by MIT Press, April 1992, NetLi-
brary, Inc
103. Holland JH (1992) Genetic algorithms. Scientific American 267(1):44–50
104. Horn J, Nafpliotis N, Goldberg DE (1994) A niched pareto genetic algorithm for
multiobjective optimization. In: Proceedings of the First IEEE Conference on Evo-
lutionary Computation, vol 1, pp 82–87, doi:10.1109/ICEC.1994.350037
105. Huynen MA (1996) Exploring phenotype space through neutral evolution. Journal of
Molecular Evolution 43(3):165–169, doi:10.1007/PL00006074
106. Huynen MA, Stadler PF, Fontana W (1996) Smoothness within ruggedness: The role
of neutrality in adaptation. Proceedings of the National Academy of Science, USA
93:397–401
107. Igel C (1998) Causality of hierarchical variable length representations. In: Proceedings
of the 1998 IEEE World Congress on Computational Intelligence, pp 324–329
108. Igel C, Toussaint M (2003) On classes of functions for which no free lunch results hold.
Information Processing Letters 86(6):317–321, doi:10.1016/S0020-0190(03)00222-9
109. Igel C, Toussaint M (2003) Recent results on no-free-lunch theorems for opti-
mization. ArXiv EPrint arXiv:cs/0303032 (Computer Science, Neural and Evo-
lutionary Computing) http://www.citebase.org/abstract?id=oai:arXiv.org:cs/
0303032 [accessed 2008-03-28]
110. Ingber L (1996) Adaptive simulated annealing (asa): Lessons learned. Control and
Cybernetics 25(1):33–54
111. Joines JA, Houck CR (1994) On the use of non-stationary penalty functions
to solve nonlinear constrained optimization problems with ga's. In: Proceed-
ings of the First IEEE Conference on Evolutionary Computation, pp 579–584,
doi:10.1109/ICEC.1994.349995
112. Jones T (1995) Evolutionary algorithms, fitness landscapes and search. PhD thesis,
The University of New Mexico
113. Kauffman SA (1988) Adaptation on rugged fitness landscapes. In: Stein DL (ed)
Lectures in the Sciences of Complexity: The Proceedings of the 1988 Complex Systems
Summer School, Addison Wesley Publishing Company, Santa Fe Institute Studies in
the Sciences of Complexity, vol Lecture I, pp 527–618
114. Kauffman SA (1993) The Origins of Order: Self-Organization and Selection in Evo-
lution. Oxford University Press
115. Kauffman SA, Levin SA (1987) Towards a general theory of adaptive walks on rugged
landscapes. Journal of Theoretical Biology 128(1):11–45
116. Kearns MJ, Mansour Y, Ng AY, Ron D (1995) An experimental and theoret-
ical comparison of model selection methods. In: COLT'95: Proceedings of the
eighth annual conference on Computational learning theory, ACM Press, pp 21–30,
doi:10.1145/225298.225301
117. Kirkpatrick S, Gelatt, Jr CD, Vecchi MP (1983) Optimization by simulated annealing.
Science 220(4598):671–680
118. Kirschner M, Gerhart J (1998) Evolvability. Proceedings of the National Academy of
Science of the USA (PNAS) 95(15):8420–8427
119. Kita H, Sano Y (2003) Genetic algorithms for optimization of noisy fitness functions
and adaptation to changing environments. In: 2003 Joint Workshop of Hayashibara
44 Weise, Zapf, Chiong, Nebro
Foundation and 2003 Workshop on Statistical Mechanical Approach to Probabilistic
Information Processing (SMAPIP)
120. Kolarov K (1997) Landscape ruggedness in evolutionary algorithms. In: Proceedings
of the IEEE Conference on Evolutionary Computation, pp 19–24
121. K¨oppen M, Wolpert DH, Macready WG (2001) Remarks on a recent paper on the "no
free lunch" theorems. IEEE Transactions on Evolutionary Computation 5(3):295–296,
doi:10.1109/4235.930318
122. Landau E (1909) Handbuch der Lehre von der Verteilung der Primzahlen. B. G.
Teubner, Leipzig, Germany, reprinted by Chelsea, New York, 1953
123. Laumanns M, Thiele L, Deb K, Zitzler E (2001) On the convergence and diversity-
preservation properties of multi-objective evolutionary algorithms. Tech. Rep. 108,
Computer Engineering and Networks Laboratory (TIK), Department of Electrical
Engineering, Swiss Federal Institute of Technology (ETH) Zurich and Kanpur Genetic
Algorithms Laboratory (KanGAL), Department of Mechanical Engineering, Indian
Institute of Technology Kanpur
124. Lawrence S, Giles CL (2000) Overfitting and neural networks: Conjugate gradient
and backpropagation. In: Proceedings of the IEEE-INNS-ENNS International Joint
Conference on Neural Networks (IJCNN'00), IEEE Computer Society, vol 1, pp 1114–
1119
125. Lee JYB, Wong PC (1995) The effect of function noise on gp efficiency. In: Progress
in Evolutionary Computation, pp 1–16, doi:10.1007/3-540-60154-6 43
126. Li X, J¨urgen Branke, Blackwell T (2006) Particle swarm with speciation and adap-
tation in a dynamic environment. In: Genetic and Evolutionary Computation Con-
ference, GECCO, pp 51–58, doi:10.1145/1143997.1144005
127. Liepins GE, Vose MD (1991) Deceptiveness and genetic algorithm dynamics. In:
Proceedings of the First Workshop on Foundations of Genetic Algorithms (FOGA),
pp 36–50
128. Ling CX (1995) Overfitting and generalization in learning discrete patterns. Neuro-
computing 8(3):341–347
129. Lohmann R (1992) Structure evolution and neural systems. In: Dynamic, Genetic,
and Chaotic Programming: The Sixth-Generation, Wiley-Interscience, pp 395–411
130. Lohmann R (1993) Structure evolution and incomplete induction. Biological Cyber-
netics 69(4):319–326, doi:10.1007/BF00203128
131. Luke S, Panait L (2006) A comparison of bloat control methods for genetic program-
ming. Evolutionary Computation 14(3):309–344
132. Lush JL (1935) Progeny test and individual performance as indicators of an animal's
breeding value. Journal of Dairy Science 18(1):1–19
133. Magurran AE (2005) Biological diversity. Current Biology Magazine 15:R116–R118,
doi:10.1016/j.cub.2005.02.006
134. Martin WN, Lienig J, Cohoon JP (1997) Island (migration) models: Evolutionary
algorithms based on punctuated equilibria. In: Handbook of Evolutionary Computa-
tion, Oxford University Press, chap 6.3
135. Mendes R, Mohais AS (2005) Dynde: a differential evolution for dynamic optimization
problems. In: Proceedings of the IEEE Congress on Evolutionary Computation, CEC,
vol 3, pp 2808–2815
136. Miller BL, Goldberg DE (1995) Genetic algorithms, tournament selection, and the
effects of noise. IlliGAL Report 95006, Illinois Genetic Algorithms Laboratory, De-
partment of General Engineering, University of Illinois
137. Miller BL, Goldberg DE (1996) Genetic algorithms, selection schemes,
and the varying effects of noise. Evolutionary Computation 4(2):113–131,
doi:10.1162/evco.1996.4.2.113
138. Miller BL, Shaw MJ (1995) Genetic algorithms with dynamic niche sharing for mul-
timodal function optimization. IlliGAL Report 95010, Department of General Engi-
neering, University of Illinois at Urbana-Champaign
Why Is Optimization Difficult? 45
139. Mitchell M, Forrest S, Holland JH (1991) The royal road for genetic algorithms:
Fitness landscapes and GA performance. In: Towards a Practice of Autonomous
Systems: Proceedings of the First European Conference on Artificial Life, pp 245–
254
140. Mitchell TM (1981) Generalization as search. In: Webber BL, Nilsson NJ (eds) Read-
ings in Artificial Intelligence, 2nd edn, Tioga Pub. Co. Press / Morgan Kaufmann
Publishers / Elsevier Science & Technology Books, pp 517–542
141. Mitchell TM (1982) Generalization as search. Artificial Intelligence 18(2):203–226,
doi:10.1016/0004-3702(82)90040-6
142. Mori N, Kita H, Nishikawa Y (1996) Adaptation to a changing environment by
means of the thermodynamical genetic algorithm. In: Proceedings of the Interna-
tional Conference on Parallel Problem Solving from Nature, PPSN, pp 513–522,
doi:10.1007/3-540-61723-X 1015
143. Mori N, Imanishi S, Kita H, Nishikawa Y (1997) Adaptation to changing environments
by means of the memory based thermodynamical genetic algorithm. In: Proceedings
of the International Conference on Genetic Algorithms, ICGA, pp 299–306
144. Mori N, Kita H, Nishikawa Y (1998) Adaptation to a changing environment by means
of the feedback thermodynamical genetic algorithm. In: Proceedings of the Inter-
national Conference on Parallel Problem Solving from Nature, PPSN, pp 149–158,
doi:10.1007/BFb0056858
145. Morrison RW (2002) Designing evolutionary algorithms for dynamic environments.
PhD thesis, George Mason University, USA
146. Morrison RW (2004) Designing Evolutionary Algorithms for Dynamic Environments,
Natural Computing, vol 24(1). Springer
147. Morrison RW, De Jong KA (1999) A test problem generator for non-stationary en-
vironments. In: Proceedings of the IEEE Congress on Evolutionary Computation,
CEC, vol 3, pp 2047–2053, doi:10.1109/CEC.1999.785526
148. Morrison RW, De Jong KA (2001) Measurement of population diversity. In: Selected
Papers of the 5th International Conference on Artificial Evolution, Evolution Artifi-
cielle, EA 2001, pp 1047–1074, doi:10.1007/3-540-46033-0 3
149. Mostaghim S (2004) Multi-objective evolutionary algorithms: Data structures, con-
vergence and, diversity. PhD thesis, Fakult¨at f¨ur Elektrotechnik, Informatik und
Mathematik, Universit¨at Paderborn, Deutschland (Germany)
150. Munetomo M, Goldberg DE (1999) Linkage identification by non-monotonicity
detection for overlapping functions. Evolutionary Computation 7(4):377–398,
doi:10.1162/evco.1999.7.4.377
151. Munetomo M, Goldberg DE (1999) Linkage identification by non-monotonicity de-
tection for overlapping functions. IlliGAL Report 99005, Illinois Genetic Algorithms
Laboratory (IlliGAL), University of Illinois at Urbana-Champaign
152. Muttil N, Liong SY (2004) Superior exploration–exploitation balance in shuffled com-
plex evolution. Journal of Hydraulic Engineering 130(12):1202–1205
153. Naudts B, Verschoren A (1996) Epistasis on finite and infinite spaces. In: Proceedings
of the 8th International Conference on Systems Research, Informatics and Cybernet-
ics, pp 19–23
154. Naudts B, Verschoren A (1999) Epistasis and deceptivity. Bulletin of the Belgian
Mathematical Society 6(1):147–154
155. Newman MEJ, Engelhardt R (1998) Effect of neutral selection on the evolution of
molecular species. Proceedings of the Royal Society of London B (Biological Sciences)
256(1403):1333–1338, doi:10.1098/rspb.1998.0438
156. Oei CK, Goldberg DE, Chang SJ (1991) Tournament selection, niching, and the
preservation of diversity. IlliGAl Report 91011, Illinois Genetic Algorithms Labora-
tory (IlliGAL), Department of Computer Science, Department of General Engineer-
ing, University of Illinois at Urbana-Champaign
46 Weise, Zapf, Chiong, Nebro
157. Olsen AL (1994) Penalty functions and the knapsack problem. In: Proceedings
of the First IEEE Conference on Evolutionary Computation, vol 2, pp 554–558,
doi:10.1109/ICEC.1994.350000
158. Osman IH (1995) An introduction to metaheuristics. In: Lawrence M, Wilsdon C (eds)
Operational Research Tutorial Papers, Stockton Press, Hampshire, UK, pp 92–122,
publication of the Operational Research Society, Birmingham, UK
159. Paenke I, Branke J, Jin Y (2007) On the influence of phenotype plasticity on genotype
diversity. In: First IEEE Symposium on Foundations of Computational Intelligence
(FOCI'07), pp 33–40, doi:10.1109/FOCI.2007.372144
160. Pan G, Dou Q, Liu X (2006) Performance of two improved particle swarm optimiza-
tion in dynamic optimization environments. In: ISDA'06: Proceedings of the Sixth
International Conference on Intelligent Systems Design and Applications (ISDA'06),
IEEE Computer Society, vol 2, pp 1024–1028
161. Pan H, Wang L, Liu B (2006) Particle swarm optimization for function optimiza-
tion in noisy environment. Applied Mathematics and Computation 181(2):908–919,
doi:10.1016/j.amc.2006.01.066
162. Pelikan M, Goldberg DE, Cant´u-Paz E (1999) Boa: The bayesian optimization algo-
rithm. In: Genetic and Evolutionary Computation Conference, GECCO, pp 525–532
163. Phillips PC (1998) The language of gene interaction. Genetics 149(3):1167–1171
164. Pohlheim H (2005) Geatbx introduction – evolutionary algorithms: Overview, meth-
ods and operators. Tech. rep., http://www.GEATbx.com [accessed 2007-07-03] , documen-
tation for GEATbx version 3.7
165. Purshouse RC (2003) On the evolutionary optimisation of many objectives. PhD
thesis, Department of Automatic Control and Systems Engineering, The University
of Sheffield
166. Radcliffe NJ (1992) Non-linear genetic representations. In: Proceedings of the Inter-
national Conference on Parallel Problem Solving from Nature, PPSN, Elsevier, pp
259–268
167. Radcliffe NJ (1994) The algebra of genetic algorithms. Annals of Mathematics and
Artificial Intelligence 10(4), doi:10.1007/BF01531276
168. Radcliffe NJ, Surry PD (1995) Fundamental limitations on search algorithms: Evolu-
tionary computing in perspective. In: van Leeuwen J (ed) Computer Science Today
– Recent Trends and Developments, Springer-Verlag, Lecture Notes in Computer
Science (LNCS), vol 1000, pp 275–291, doi:10.1007/BFb0015249
169. Rayward-Smith VJ (1994) A unified approach to tabu search, simulated annealing
and genetic algorithms. In: Rayward-Smith VJ (ed) Applications of Modern Heuris-
tic Methods – Proceedings of the UNICOM Seminar on Adaptive Computing and
Information Processing, Alfred Waller Ltd / Nelson Thornes Ltd / Unicom Seminars
Ltd, Henley-on-Thames, UK, vol I, pp 55–78
170. Rechenberg I (1973) Evolutionsstrategie: Optimierung technischer Systeme nach
Prinzipien der biologischen Evolution. Frommann-Holzboog Verlag, Stuttgart
171. Rechenberg I (1994) Evolutionsstrategie '94, Werkstatt Bionik und Evolutionstech-
nik, vol 1. Frommann Holzboog
172. Reidys CM, Stadler PF (2001) Neutrality in fitness landscapes. Applied Mathematics
and Computation 117(2–3):321–350, doi:10.1016/S0096-3003(99)00166-6
173. Richter H (2004) Behavior of evolutionary algorithms in chaotically changing fit-
ness landscapes. In: Proceedings of the International Conference on Parallel Problem
Solving from Nature, PPSN, pp 111–120
174. Riedl RJ (1977) A systems-analytical approach to macroevolutionary phenomena.
Quarterly Review of Biology pp 351–370
175. Robbins H, Monro S (1951) A stochastic approximation method. Annals of Mathe-
matical Statistics 22(3):400–407, doi:10.1214/aoms/1177729586
Why Is Optimization Difficult? 47
176. Ronald S (1995) Preventing diversity loss in a routing genetic algorithm with hash
tagging. Complexity International 2, http://www.complexity.org.au/ci/vol02/sr_
hash/ [accessed 2008-12-07]
177. Ronald S (1996) Genetic algorithms and permutation-encoded problems. diversity
preservation and a study of multimodality. PhD thesis, University Of South Australia.
Department of Computer and Information Science
178. Ronald S (1997) Robust encodings in genetic algorithms: A survey of encoding issues.
In: Proceedings of the IEEE Congress on Evolutionary Computation, CEC, pp 43–48,
doi:10.1109/ICEC.1997.592265
179. Rosca JP (1995) An analysis of hierarchical genetic programming. Tech. Rep. TR566,
The University of Rochester, Computer Science Department
180. Rosca JP, Ballard DH (1995) Causality in genetic programming. In: Proceedings of
the International Conference on Genetic Algorithms, ICGA, pp 256–263
181. Rosin PL, Fierens F (1995) Improving neural network generalisation. In: Proceedings
of the International Geoscience and Remote Sensing Symposium, "Quantitative Re-
mote Sensing for Science and Applications", IGARSS'95, IEEE, vol 2, pp 1255–1257,
doi:10.1109/IGARSS.1995.521718
182. Rothlauf F (2002 (1st ed.), 2006 (2nd ed.)) Representations for Genetic and Evolu-
tionary Algorithms, 2nd edn. Physica-Verlag
183. Routledge RD (1979) Diversity indices: Which ones are admissible? Journal of The-
oretical Biology 76:503–515, doi:10.1016/0022-5193(79)90015-8
184. Rudnick WM (1992) Genetic algorithms and fitness variance with an application to
the automated design of artificial neural networks. PhD thesis, Oregon Graduate
Institute of Science & Technology
185. Rudolph G (1999) Self-adaptation and global convergence: A counter-example. In:
Proceedings of the IEEE Congress on Evolutionary Computation, CEC, vol 1, pp
646–651
186. Rudolph G (2001) Self-adaptive mutations may lead to premature convergence. IEEE
Transactions on Evolutionary Computation 5(4):410–414
187. Rudolph G (2001) Self-adaptive mutations may lead to premature convergence. Tech.
Rep. CI–73/99, Fachbereich Informatik, Universit¨at Dortmund
188. Sano Y, Kita H (2000) Optimization of noisy fitness functions by means of
genetic algorithms using history of search. In: Proceedings of the Interna-
tional Conference on Parallel Problem Solving from Nature, PPSN, pp 571–580,
doi:10.1007/3-540-45356-3 56
189. Sano Y, Kita H (2002) Optimization of noisy fitness functions by means of genetic
algorithms using history of search with test of estimation. In: Proceedings of the
IEEE Congress on Evolutionary Computation, CEC, pp 360–365
190. Sarle W (2007) What is overfitting and how can i avoid it? Usenet FAQs:
compaineural-nets FAQ 3: Generalization(3)
191. Sarle WS (1995) Stopped training and other remedies for overfitting. In: Proceedings
of the 27th Symposium on the Interface: Computing Science and Statistics, pp 352–
360
192. Schaffer JD, Eshelman LJ, Offutt D (1990) Spurious correlations and premature con-
vergence in genetic algorithms. In: Proceedings of the First Workshop on Foundations
of Genetic Algorithms (FOGA), pp 102–112
193. Sendhoff B, Kreutz M, von Seelen W (1997) A condition for the genotype-phenotype
mapping: Causality. In: Proceedings of the International Conference on Genetic Al-
gorithms, ICGA, pp 73–80
194. Shackleton M, Shipman R, Ebner M (2000) An investigation of redundant genotype-
phenotype mappings and their role in evolutionary search. In: Proceedings of the
IEEE Congress on Evolutionary Computation, CEC, pp 493–500
48 Weise, Zapf, Chiong, Nebro
195. Shekel J (1971) Test functions for multimodal search techniques. In: Proceedings of
the Fifth Annual Princeton Conference on Information Science and Systems, Prince-
ton University Press, pp 354–359
196. Shipman R (1999) Genetic redundancy: Desirable or problematic for evolutionary
adaptation? In: Proceedings of the 4th International Conference on Artificial Neural
Nets and Genetic Algorithms, pp 1–11
197. Shipman R, Shackleton M, Ebner M, Watson R (2000) Neutral search spaces for
artificial evolution: a lesson from life. In: Bedau M, McCaskill JS, Packard NH, Ras-
mussen S, McCaskill J, Packard N (eds) Artificial Life VII: Proceedings of the Seventh
International Conference on Artificial Life, The MIT Press, Bradford Books, Complex
Adaptive Systems
198. Shipman R, Shackleton M, Harvey I (2000) The use of neutral genotype-phenotype
mappings for improved evolutionary search. BT Technology Journal 18(4):103–111
199. Siedlecki WW, Sklansky J (1989) Constrained genetic optimization via dynamic
reward-penalty balancing and its use in pattern recognition. In: Proceedings of the
third international conference on Genetic algorithms, pp 141–150
200. Singh G, Deb K (2006) Comparison of multi-modal optimization algorithms based
on evolutionary algorithms. In: Genetic and Evolutionary Computation Conference,
GECCO, pp 1305–1312
201. Smith AE, Coit DW (1997) Penalty functions. In: Handbook of Evolutionary Com-
putation, Oxford University Press in cooperation with the Institute of Physics Pub-
lishing, chap C 5.2
202. Smith M (1993/1996) Neural Networks for Statistical Modeling. John Wiley & Sons,
Inc. / International Thomson Computer Press
203. Smith SSF (2004) Using multiple genetic operators to reduce premature convergence
in genetic assembly planning. Computers in Industry 54(1):35–49
204. Smith T, Husbands P, Layzell P, O'Shea M (2002) Fitness landscapes and evolvability.
Evolutionary Computation 10(1):1–34, doi:10.1162/106365602317301754
205. Spatz BM, Rawlins GJE (eds) (1990) Proceedings of the First Workshop on Founda-
tions of Genetic Algorithms, Morgan Kaufmann Publishers, Inc.
206. Spieth C, Streichert F, Speer N, Zell A (2004) Utilizing an island model for ea to
preserve solution diversity for inferring gene regulatory networks. In: Proceedings of
the IEEE Congress on Evolutionary Computation, CEC, vol 1, pp 146–151
207. Stagge P, Igel C (2000) Structure optimization and isomorphisms. In: Theoretical
Aspects of Evolutionary Computing, Springer, pp 409–422
208. Stewart T (2001) Extrema selection: accelerated evolution on neutral networks.
In: Proceedings of the IEEE Congress on Evolutionary Computation, CEC, vol 1,
doi:10.1109/CEC.2001.934366
209. Taguchi G (1986) Introduction to Quality Engineering: Designing Quality into Prod-
ucts and Processes. Asian Productivity Organization / American Supplier Institute
Inc. / Quality Resources / Productivity Press Inc., translation of Sekkeisha no tame
no hinshitsu kanri
210. Taillard ´
ED, Gambardella LM, Gendrau M, Potvin JY (2001) Adaptive memory
programming: A unified view of metaheuristics. European Journal of Operational
Research 135(1):1–16, doi:10.1016/S0377-2217(00)00268-X
211. Tetko IV, Livingstone DJ, Luik AI (1995) Neural network studies, 1. comparison of
overfitting and overtraining. Journal of Chemical Information and Computer Sciences
35(5):826–833
212. Thierens D (1999) On the scalability of simple genetic algorithms. Tech. Rep. UU-
CS-1999-48, Department of Information and Computing Sciences, Utrecht University
213. Thierens D, Goldberg DE, Pereira ˆ
AG (1998) Domino convergence, drift, and the
temporal-salience structure of problems. In: Proceedings of the IEEE Congress on
Evolutionary Computation, CEC, pp 535–540, doi:10.1109/ICEC.1998.700085
Why Is Optimization Difficult? 49
214. Toussaint M, Igel C (2002) Neutrality: A necessity for self-adaptation. In: Proceedings
of the IEEE Congress on Evolutionary Computation, CEC, pp 1354–1359
215. Trelea IC (2003) The particle swarm optimization algorithm: convergence anal-
ysis and parameter selection. Information Processing Letters 85(6):317–325,
doi:10.1016/S0020-0190(02)00447-7
216. Trojanowski K (1994) Evolutionary algorithms with redundant genetic material for
non-stationary environments. PhD thesis, Instytut Podstaw Informatyki PAN, Insti-
tute of Computer Science, Warsaw, University of Technology, Poland
217. Tsutsui S, Ghosh A (1997) Genetic algorithms with a robust solution
searching scheme. IEEE Transactions on Evolutionary Computation 1:201–208,
doi:10.1109/4235.661550
218. Tsutsui S, Ghosh A, Fujimoto Y (1996) A robust solution searching scheme in genetic
search. In: Proceedings of the International Conference on Parallel Problem Solving
from Nature, PPSN, pp 543–552
219. Ursem RK (2003) Models for evolutionary algorithms and their applications in system
identification and control optimization. PhD thesis, Department of Computer Science,
University of Aarhus, Denmark
220. Vaessens RJM, Aarts EHL, Lenstra JK (1992) A local search template. In: Proceed-
ings of the International Conference on Parallel Problem Solving from Nature, PPSN,
pp 67–76
221. Vaessens RJM, Aarts EHL, Lenstra JK (1998) A local search template. Computers
and Operations Research 25(11):969–979, doi:10.1016/S0305-0548(97)00093-2
222. van Nimwegen E, Crutchfield JP (2001) Optimizing epochal evolutionary
search: Population-size dependent theory. Machine Learning 45(1):77–114,
doi:10.1023/A:1012497308906
223. van Nimwegen E, Crutchfield JP, Huynen M (1999) Neutral evolution of mutational
robustness. Proceedings of the National Academy of Science of the United States of
Americs (PNAS) – Evolution 96(17):9716–9720
224. van Nimwegen E, Crutchfield JP, Mitchell M (1999) Statistical dynamics of
the royal road genetic algorithm. Theoretical Computer Science 229(1–2):41–102,
doi:10.1016/S0304-3975(99)00119-X
225. Wagner A (2005) Robustness and Evolvability in Living Systems. Princeton Studies
in Complexity, Princeton University Press
226. Wagner A (2005) Robustness, evolvability, and neutrality. FEBS Lett 579(8):1772–
1778
227. Wagner GP, Altenberg L (1996) Complex adaptations and the evolution of evolvabil-
ity. Evolution 50(3):967–976
228. Watanabe S (1969) Knowing and Guessing: A Quantitative Study of Inference and
Information. John Wiley & Sons
229. Weicker K (2002) Evolution¨are Algorithmen. Leitf¨aden der Informatik, B. G. Teubner
GmbH
230. Weicker K, Weicker N (2000) Burden and benefits of redundancy. In: Sixth Workshop
on Foundations of Genetic Algorithms (FOGA), Morgan Kaufmann, pp 313–333
231. Weise T, Zapf M, Geihs K (2007) Rule-based Genetic Programming. In: Proceed-
ings of BIONETICS 2007, 2nd International Conference on Bio-Inspired Models of
Network, Information, and Computing Systems
232. Weise T, Niemczyk S, Skubch H, Reichle R, Geihs K (2008) A tunable model for
multi-objective, epistatic, rugged, and neutral fitness landscapes. In: Genetic and
Evolutionary Computation Conference, GECCO, pp 795–802
233. Whitley LD, Gordon VS, Mathias KE (1994) Lamarckian evolution, the baldwin
effect and function optimization. In: Proceedings of the International Conference on
Parallel Problem Solving from Nature, PPSN, pp 6–15
50 Weise, Zapf, Chiong, Nebro
234. Wiesmann D, Hammel U, B¨ack T (1998) Robust design of multilayer optical coatings
by means of evolutionary algorithms. IEEE Transactions on Evolutionary Computa-
tion 2:162–167, doi:10.1109/4235.738986
235. Wiesmann D, Hammel U, B¨ack T (1998) Robust design of multilayer optical coatings
by means of evolutionary strategies. Sonderforschungsbereich (sfb) 531, Universit¨at
Dortmund
236. Wilke CO (1999) Evolutionary dynamics in time-dependent environments. PhD the-
sis, Fakult¨at f¨ur Physik und Astronomie, Ruhr-Universit¨at Bochum
237. Wilke CO (2001) Adaptive evolution on neutral networks. Bulletin of Mathematical
Biology 63(4):715–730
238. Wilke DN, Kok S, Groenwold AA (2007) Comparison of linear and classical velocity
update rules in particle swarm optimization: notes on diversity. International Journal
for Numerical Methods in Engineering 70(8):962–984
239. Williams GC (1957) Pleiotropy, natural selection, and the evolution of senescence.
Evolution 11(4):398–411, doi:10.2307/2406060
240. Winter PC, Hickey GI, Fletcher HL (1st: 1998, 2nd ed: 2002, 3rd ed: 2006) Instant
Notes in Genetics. Springer, New York / BIOS Scientific Publishers / Taylor & Francis
Ltd.
241. Wolpert DH, Macready WG (1995) No free lunch theorems for search. Tech. Rep.
SFI-TR-95-02-010, The Santa Fe Institute
242. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE
Transactions on Evolutionary Computation 1(1):67–82, doi:10.1109/4235.585893
243. Wu N (2006) Differential evolution for optimisation in dynamic environments. Tech.
rep., School of Computer Science and Information Technology, RMIT University
244. Yang S, Ong YS, Jin Y (eds) (2007) Evolutionary Computation in Dynamic and Un-
certain Environments, Studies in Computational Intelligence, vol 51(XXIII). Springer
245. Zakian V (1979) New formulation for the method of inequalities. Proceedings of the
Institution of Electrical Engineers 126(6):579–584
246. ˇ
Zilinskas A (1978) Algorithm as 133: Optimization of one-dimensional multimodal
functions. Applied Statistics 27(3):367–375, doi:10.2307/2347182
247. Zitzler E, Laumanns M, Thiele L (2001) SPEA2: Improving the Strength Pareto
Evolutionary Algorithm. Tech. Rep. 103, Computer Engineering and Networks Lab-
oratory (TIK), Swiss Federal Institute of Technology (ETH) Zurich
248. Zitzler E, Laumanns M, Thiele L (2001) SPEA2: Improving the strength pareto evo-
lutionary algorithm for multiobjective optimization. In: Evolutionary Methods for
Design, Optimisation and Control with Application to Industrial Problems. Proceed-
ings of the EUROGEN2001 Conference, pp 95–100
Why Is Optimization Difficult? 51
@incollection{WZCN2009WIOD,
author = {Thomas Weise and Michael Zapf and Raymond Chiong
and Antonio Jes{\'u}s {Nebro Urbaneja}},
title = {Why Is Optimization Difficult?},
chapter = {1},
booktitle = {Nature-Inspired Algorithms for Optimisation},
publisher = {Springer},
year = {2009},
pages = {1--50},
ISBN = {978-3-642-00266-3},
month = apr # {~30},
issn = {1860-949X (Print) 1860-9503 (Online)},
series = {Studies in Computational Intelligence},
volume = {193},
editor = {Raymond Chiong},
url = {http://www.it-weise.de/documents/files/WZCN2009WIOD.pdf},
url = {http://www.springer.com/engineering/book/978-3-642-00266-3},
}
... The search space created by a real-world single-or multi-objective OP is not free of variable linkage, epistasis, rotation and other relations between their decision variables (Weise, Zapf et al. 2009;Weise, Chiong et al. 2012). Moreover, these properties could change throughout the search space and would make the optimization process uncertain, which must be managed. ...
In this paper, we propose a novel hybrid fuzzy–metaheuristic approach with the aim of overcoming premature convergence when solving multimodal single and multi-objective optimization problems. The metaheuristic algorithm used in our proposed approach is based on the imperialist competitive algorithm (ICA), a population-based method for optimization. The ICA divides its population into sub-populations, known as empires. Each empire is composed of a high fitness solution—the imperialist—and some lower fitness solutions—the colonies. Colonies move towards their associated imperialist in each empire to achieve better status (higher fitness). The most powerful empire tends to attract weaker colonies. These competitions and movements can be enhanced for better algorithm performance. In our hybrid approach, a global learning strategy is devised for each colony to learn from its best-known position, its associated imperialist and the global best imperialist. A fast-evolutionary elitism local search is used to enhance the collaborative search mechanism (competition) in each empire, and thus the overall optimization performance may be improved. Other main evolutionary operators include velocity adaptation and velocity divergence. To address parameterization and computational cost evaluation issues, two fuzzy inferencing mechanisms are designed and used in parallel: one is a learning strategy adaptor in each run, and the other is a smart evolution selector in each running window. For Pareto front approximation, fast-elitism non-dominated sorting is applied to the solutions, and a novel penalized sigma diversity index is designed to estimate the diversity (power) of solutions in the same rank. Comprehensive experimental results based on 22 well-known single-objective and 25 multi-objective benchmark instances clearly show that our proposed approach has better solution quality compared with other popular metaheuristics and state-of-the-art ICA variants. The proposed approach can be used as an optimization module in any intelligent decision-making systems to enhance efficiency and accuracy. The designed fuzzy inferencing mechanisms can also be incorporated into any single- or multi-objective optimizers for parameter tuning purposes, to make the optimizers more adaptive to new problems or environments.
... There are a number of potential pitfalls that can inhibit an optimisation task (Weise et al., 2009). Firstly, it must be ensured that the global maximum of the function under consideration is found and not just a local maximum. ...
- Dean Markwick
Many statistical problems involve modelling the times at which events occur. There are cases where events can occur in clusters with sudden jumps in the total number of occurrences. To model such data an intensity function can be constructed which describes the probability of an event occurring at a specific time. The Hawkes process is a point process model with a conditional intensity function that provides a change in intensity for each event occurrence and as such the Hawkes process can be used to explain event clustering. The flexibility and extendability of the Hawkes process will be highlighted in this thesis. I extend the Hawkes process by using nonparametric Bayesian methods where different components of the Hawkes process are constructed using a Dirichlet process which is a Bayesian prior for distributions. This allows for a data driven approach and removes the need for parametric assumptions. This Bayesian approach also allows for a hierarchical structure to be integrated in the models where appropriate. These extended Hawkes process are applied to different application domains including: extreme value theory, financial trading and soccer goal occurrence modelling. Each new application introduces a different extension to the Hawkes process and illustrates how it improves on existing methodology. From this research I also wrote a new software package for using Dirichlet processes. This software enables users to easily construct Dirichlet process objects that can incorporated into existing inference workflows. This allows users to introduce nonparametric methods without needing to program their own inference methods.
... Given the fact that as many dimensions the problem has, as much time would be needed to check all possible solutions, we conclude that exhaustive search is not an option. Thus, optimization is a difficult process (Weise et al. 2009), because the goal is to find one of the possible sub-optimal solutions existing in the search space, without knowing either the global optimum, or if there is already a better sub-optimal solution that has not been discovered yet. ...
In the last decade, we observe an increasing number of nature-inspired optimization algorithms, with authors often claiming their novelty and their capabilities of acting as powerful optimization techniques. However, a considerable number of these algorithms do not seem to draw inspiration from nature or to incorporate successful tactics, laws, or practices existing in natural systems, while also some of them have never been applied in any optimization field, since their first appearance in literature. This paper presents some interesting findings that have emerged after the extensive study of most of the existing nature-inspired algorithms. The need for irrationally introducing new nature inspired intelligent (NII) algorithms in literature is also questioned and possible drawbacks of NII algorithms met in literature are discussed. In addition, guidelines for the development of new nature-inspired algorithms are proposed, in an attempt to limit the misleading appearance of variation of metaheuristics as nature inspired optimization algorithms.
- Loris Serafino
The chapter considers the recent but already classic theoretical result called No Free Lunch Theorem in the context of optimization practice. The No Free Lunch Theorem is probably the fundamental theoretical result of the Machine Learning field but its practical meaning and implication for practitioners facing "real life" industrial and design optimization problems are rarely addressed in the technical literature. This discussion is intended for a broad audience of mathematicians, engineers, and computer scientists and presents a probabilistic understanding of the theorem that can shed light to its meaning and impact in the industrial optimization practice.
jMetal is a Java-based framework for multi-objective optimization with metaheuristics that is widely used in the field. The jMetal project started in 2006, and it is continuously evolving. In this chapter, we describe a set of new features that are in current development, aimed at facilitating the analysis of the results of executing multi-objective algorithms and at to improve the usability of jMetal to solve real-world problems. Concretely, we focus on the new features related to algorithm templates, visualization, analysis of results, auto-configuration of multi-objective metaheuristics, and parallelism.
In the field of complex problem optimization with metaheuristics, semantics has been used for modeling different aspects, such as: problem characterization, parameters, decision-maker's preferences, or algorithms. However, there is a lack of approaches where ontologies are applied in a direct way into the optimization process, with the aim of enhancing it by allowing the systematic incorporation of additional domain knowledge. This is due to the high level of abstraction of ontologies, which makes them difficult to be mapped into the code implementing the problems and/or the specific operators of metaheuristics. In this paper, we present a strategy to inject domain knowledge (by reusing existing ontologies or creating a new one) into a problem implementation that will be optimized using a metaheuristic. Thus, this approach based on accepted ontologies enables building and exploiting complex computing systems in optimization problems. We describe a methodology to automatically induce user choices (taken from the ontology) into the problem implementations provided by the jMetal optimization framework. With the aim of illustrating our proposal, we focus on the urban domain. Concretely, we start from defining an ontology representing the domain semantics for a city (e.g., building, bridges, point of interest, routes, etc.) that allows defining a-priori preferences by a decision maker in a standard, reusable, and formal (logic-based) way. We validate our proposal with several instances of two use cases, consisting in bi-objective formulations of the Traveling Salesman Problem (TSP) and the Radio Network Design problem (RND), both in the context of an urban scenario. The results of the experiments conducted show how the semantic specification of domain constraints are effectively mapped into feasible solutions of the tackled TSP and RND scenarios. This proposal aims at representing a step forward towards the automatic modeling and adaptation of optimization problems guided by semantics, where the annotation of a human expert can be now considered during the optimization process.
Mixed-discrete optimization deals with mathematical optimization problems with multiple types of variables: discrete (nominal) taking values from a not-sortable set of possible elements, integer variables and variables taking values in a continuous domain. Mixed-discrete problems appear naturally in many contexts such as in the real world in the engineering domain, bioinformatics and data sciences, and this has led to an increased interest in the design of strong algorithms for different variants of the problem. Much effort has been spent over the last decades in studying and developing new methodologies, but unfortunately mixed-discrete optimization problems are much less understood then their "non-mixed" counterparts. In this chapter we will focus on the rather new approaches to handle mixed-discrete problems by means of surrogate methods.
In this paper, we investigate how systemic errors due to random sampling impact on automated algorithm selection for bound-constrained, single-objective, continuous black-box optimization. We construct a machine learning based algorithm selector, which uses exploratory landscape analysis features as inputs. We test the accuracy of the recommendations experimentally using resampling techniques and the hold-one-instance-out and hold-one-problem-out validation methods. The results demonstrate that the selector remains accurate even with sampling noise, although not without trade-offs.
The Transit Network Design and Frequency Setting Problem (TNDFSP) can be defined as the creation of effective routes in a public transport network and the determination of relevant frequencies. Generally, the TNDFSP problem is in the same category as the Traveling Salesman Problem (TSP), which is known to be a non-deterministic polynomial-period (NP-hard) difficult problem. This study consists of two stages: first, the design of a public transport network with an evolutionary optimization technique – the Intelligent Water Drops (IWD) algorithm – based on the TSP and the determination of relevant frequencies; and second, the assignment of passengers to designated routes. All decisions related to public transport network design may be evaluated by considering environmental costs in relation to passengers, operators, and the environment. This study presents an acceptable, constructive, and original algorithm.
- Karsten Weicker
Evolutionäre Algorithmen sind relativ neue Methoden zur Lösung von Optimierungsproblemen in Industrie, Wirtschaft und Forschung. Inspiriert durch die biologische Evolution imitieren sie das Wechselspiel zwischen Variation von Individuen und Selektion. In diesem Lehrbuch wird neben der Darstellung der Standardalgorithmen vor allem das gängige Verständnis für die Arbeitsweise und die zu Grunde liegenden Prinzipien vermittelt. Darüber hinaus werden spezielle Anforderungen aus der Praxis, wie z. B. die Beachtung von Randbedingungen, Mehrzieloptimierung und verrauschte oder zeitabhängige Probleme, diskutiert. In der nun vorliegenden zweiten Auflage wurde neben einer anschaulichen Darstellung der Grundlagen anhand vieler Beispiele insbesondere die praktische Anwendung, z. B. durch Fallbeispiele aus verschiedenen Themenbereichen, stärker berücksichtigt.
- Stuart A Kauffman
Stuart Kauffman here presents a brilliant new paradigm for evolutionary biology, one that extends the basic concepts of Darwinian evolution to accommodate recent findings and perspectives from the fields of biology, physics, chemistry and mathematics. The book drives to the heart of the exciting debate on the origins of life and maintenance of order in complex biological systems. It focuses on the concept of self-organization: the spontaneous emergence of order widely observed throughout nature. Kauffman here argues that self-organization plays an important role in the emergence of life itself and may play as fundamental a role in shaping life's subsequent evolution as does the Darwinian process of natural selection. Yet until now no systematic effort has been made to incorporate the concept of self-organization into evolutionary theory. The construction requirements which permit complex systems to adapt remain poorly understood, as is the extent to which selection itself can yield systems able to adapt more successfully. This book explores these themes. It shows how complex systems, contrary to expectations, can spontaneously exhibit stunning degrees of order, and how this order, in turn, is essential for understanding the emergence and development of life on Earth. Topics include the new biotechnology of applied molecular evolution, with its important implications for developing new drugs and vaccines; the balance between order and chaos observed in many naturally occurring systems; new insights concerning the predictive power of statistical mechanics in biology; and other major issues. Indeed, the approaches investigated here may prove to be the new center around which biological science itself will evolve. The work is written for all those interested in the cutting edge of research in the life sciences.
- Martijn Huynen
RNA secondary-structure folding algorithms predict the existence of connected networks of RNA sequences with identical secondary structures. Fitness landscapes that are based on the mapping between RNA sequence and RNA secondary structure hence have many neutral paths. A neutral walk on these fitness landscapes gives access to a virtually unlimited number of secondary structures that are a single point mutation from the neutral path. This shows that neutral evolution explores phenotype space and can play a role in adaptation.
- Gunar E. Liepins
- Michael D. Vose
We address deceptiveness, one of at least four reasons genetic algorithms can fail to converge to function optima. We construct fully deceptive functions and other functions of intermediate deceptiveness. For the fully deceptive functions of our construction, we specify linear transformations that induce changes of representation to render the functions fully easy. We further model genetic algorithm selection and recombination as the interleaving of linear and quadratic operators. Spectral analysis of the underlying matrices allows us to draw preliminary conclusions about fixed points and their stability. We also obtain an explicit formula relating the nonuniform Walsh transform to the dynamics of genetic search.
Posted by: jadajadaparajone0267919.blogspot.com
Source: https://www.researchgate.net/publication/226934059_Why_Is_Optimization_Difficult
0 Comments