Evolutionary algorithms - continuous and combinatorial optimization

Last time, we talked about a simple genetic algorithm and its variations with respect to different genetic operators and selections. Today, we will look mainly at other ways to encode individuals. We’ll also show some other applications of evolutionary algorithms.

Integer coding

The simplest extension of binary coding is integer encoding, that is, encoding of individuals using integers. Such encoding is useful, for example, if the goal is to solve the problem of partitioning a set of $N$ numbers into $k$ subsets with the same sum. The individual is then represented as a vector of $N$ numbers between 0 and $k-1$. The number at position $i$ then expresses to which subset the $i$-th number from the input set belongs. There are more uses, however; graph coloring and other problems can be coded similarly.

For integer encoding, we can use simple generalizations of operators for binary encoding. We don’t even need to change the crossover implementations - they still work the same, because the individual is still a vector. Several options can be chosen for mutation. For example, we can change the number at a given position to any other number, or, if it makes sense for the problem, increase (or decrease) the number by 1, or some other constant.

Permutation encoding

A special case of integer coding is permutation coding. In this case, an individual is also encoded as a list of numbers from 0 to $k$, but we additionally want each number to appear exactly once in the individual. Such encoding is useful in solving many combinatorial problems. A typical example is the well-known traveling salesman problem, i.e., finding the shortest circle in a complete graph that passes through all vertices. The permutation then specifies the order in which to visit the vertices.

But permutation encoding is also used in other problems, often as input to heuristics that further solve the problem. For example, when solving a bin-packing problem (stacking objects of given sizes between 0 and 1 into unit-sized bins so that as few bins as possible are used), a first-fit heuristic can be used, where an object is put into the first bin it will fit into, and if there is no bin, a new one is created. This heuristic needs an order on the input in which to try to add objects - this is just given as a permutation.

The biggest complication for permutation encoding is how to create operators. Simple operators for integer encoding will very often produce children that are not permutations. Coming up with a mutation that always returns valid individuals is easy, and we have several options. One example would be a mutation that swaps values at two different positions in the individual. But we can also have a mutation that similarly moves some part of the individual to a different location, or flips some part backwards. Which mutation is suitable for which problem depends on what the coding expresses exactly.

The bigger problem is how to create a crossover. The typical approach here is to do something like a 2-point crossover and then fix the resulting solution. For example, Order Crossover (OX) works by swapping the middle of both parents and filling in the remaining positions in the children according to their order in the other parent (starting to the right of the second crossover point).

12|345|678      ..|186|...      45|186|723 (skipping numbers 186)
            ->              ->  
34|186|527      ..|345|...      86|345|271 (skipping numbers 345)

Another popular example of crossover is Partially Mapped Crossover (PMX). Here, again, the “middle” parts of the individual are swapped first. Then the values not yet in the individual are added to their positions from the other parent. The rest is then completed in such a way that the swapped pairs in the middle part indicate the mapping of which value to replace with which value from the other parent. For example, below, we want to find the value at the first position of the first individual. Originally there was a value of 1, but that’s already in the swapped part. But there it was being replaced with a value of 3, which is not yet in the individual - so instead of 1, we put a value of 3 in that position. For the other positions similarly. Occasionally, the replacement chain may be longer because the swapped value is also in the individual. In that case, we do exactly the same with the new value.

12|345|678      .2|186|.7.      32|186|574 (1->3, 6->5, 8->4)
            ->              ->  
34|186|527      ..|345|.27      18|345|627 (3->1, 4->8, 5->6) 

Especially for the travelling salesman problem, the so-called Edge Recombination (ER) is then often used. In this crossover, the goal is to combine edges from both solutions into one. Thus, for each vertex (value in an individual), all vertices that are adjacent to it in one or the other parent are put in a list (remember that the first one is adjacent to the last one - we are looking for cycles). Then start with the vertex that has the shortest list of neighbours. Of those, the one with the shortest list of neighbors is again chosen and that one is put second. Both vertices are crossed off the neighbor lists of all other vertices. The same procedure is then followed with the other neighbours, whereby if multiple vertices have the same (smallest) number of neighbours, one of them is chosen at random.

Continuous optimization - real number coding

A very common area where evolutionary algorithms are used is the optimization of functions $\mathbb{R^n} \to \mathbb{R}$. Using such functions, it is possible to encode many practical problems ranging from tuning parameters of various processes, to machine learning problems, to finding weights of neural networks - for example in reinforcement learning.

For continuous optimization, individuals are encoded as vectors of numbers (typically of type float or double). In this case, it is relatively easy to devise some genetic operators. Mutations for continuous coding are of two types - biased and unbiased. Unbiased mutations simply generate a new number from the range for a given variable, while biased mutations are based on a value that already occurs at that position and just modify it. A typical example is a Gaussian mutation that adds a value from a normal distribution with mean 0 and some suitable variance to a given number.

There are also several options for crossovers - we can again use $n$-point crossovers, which we already know, or we can use so-called arithmetic crossovers, where the offspring is obtained as a weighted average of the parents.

The problem with the above operators is that they mutate/cross each coordinate of the vector independently. This works well for separable functions, i.e. ones that can be optimized single coordinate at a time (fix all variables except one, find the optimum over that one, and have one coordinate of the optimum). Moreover, these operators make equally large changes in all directions and thus do not work well for functions that have high conditionality. Imagine, for example, a function whose contours are ellipses - a low-conditioned function has both axes of the ellipse almost the same length, while high-conditioned functions have a large ratio of axis lengths. If the axes are parallel to the axes of the coordinate system, we have a separable function. If they are not, the function is non-separable.

There are a relatively large number of evolutionary algorithms that address the problems of function non-separability and conditionality. We will only show the so-called differential evolution (DE}. In differential evolution, a mutation is performed that adds the difference of two randomly chosen individuals from the population to a given individual. This makes the mutation invariant to arbitrary rotations and scaling of the search space. Thus, non-separable or highly conditioned functions do not cause problems. In addition to this mutation, it still performs uniform crossover with another random individual. Selection then compares the offspring to the parent, leaving only the better of the two in the population.