Genetic algorithm (GA)

Genetic algorithm

Genetic algorithms are search and optimization methods inspired by an analogy to biological evolution, natural selection, and genetics. The basic components used in most genetic algorithms are:

a population of individuals, each individual representing a possible solution to the problem at hand;
a fitness function used for optimization;
a selection operator used to choose which individuals in the current population will reproduce;
a crossover operator used to exchange sub-sequences of two individuals to create an offspring for a new generation;
a mutation operator used to create random mutations in the offspring. In this genetic algorithm, the main function, called 'evolution', describes the main algorithm.

The process begins by creating an initial population filled with randomly generated individuals. The criteria for generating these individuals can be specified in the parameters file and the size of the initial population in the genetic algorithm settings file. There is also a possibility of dividing the initial population into sub-populations.

After creating the initial population, the fitness scores must be calculated for each individual. The function for how the score is calculated can be specified in the global settings file.

Based on the fitness scores of the individuals for the current generation, a new population will be generated by selecting parents and generating offspring until the desired population size is reached. For selecting parents, various selection methods can be used. After selection, various crossover methods can be used to create an offspring. This process is repeated multiple times to create a population of a suitable size. Optionally, elitism and culling methods can be used to augment this process.

For the newly generated population, the fitness scores will be calculated again and the process repeats itself. This cycle will continue until one of the stopping criteria is reached.

Currently, two stopping criteria are in use:

maximum number of iterations, specified in the settings
improvement threshold, also specified in the settings

The improvement threshold criterion is reached when the improvement of the fitness of the current generation compared to the previous one is equal to or lower than the specified threshold for at least two consecutive generations. This criterion could possibly be replaced with a required compactness criterion, which is reached when the difference between the members of the current population is calculated to be smaller than a specified threshold.

Sub-populations

The number of sub-populations can be specified in the settings file. This number must be an integer. If it is set to 1, then the population is not divided into sub-populations, and evolution takes place in one unified pool of individuals. If it is larger than 1, then all individuals will be randomly divided into the specified number of sub-population.

In each of these groups, evolution will take place separately with no possibility of cross-over between member of different sub-populations. After the run of the evolution process for each group comes to an end according to specified end criteria, the sub-populations will be merged, and another run of the evolution process will take place.

The goal of using sub-populations is to avoid having the algorithm converge to a local maximum. The idea is for each group to find one local maximum, and after merging, hopefully to find the global maximum.

Selection methods

The goal of a selection function is to choose two parents from the current population based on their fitness scores. There are several selection methods available in the ga_selection file. Currently, the tournament method is used.

Tournament selection

The tournament selection method works by holding two "tournaments". For each tournament a number of members from the population are randomly chosen to participate. This number is possible to be specified with the t_size argument. Currently, the default value of 2 is used.

In the tournament, there is some probability that the participant with the highest fitness score will "win". This probability is specified with the t_prob argument. Currently, the default value of 0.8 is used. If the fittest participant does not win, then it is removed from the tournament, and the process repeats. If there is only one participant left, then it will win. The winner of the tournament is selected to be a parent.

Two tournaments are held to choose two parents.

Roulette wheel selection

The roulette wheel selection works by assigning a probability of selection to each individual in the population. This probability depends on the fitness of the individuals and is calculated as follows:

p_i = f_i / Σf_i,

where p_i denotes the probability of an individual i and f_i denotes its fitness score.

Two parents are selected based on the calculated probabilities.

Rank selection

The roulette wheel selection works by assigning a probability of selection to each individual in the population. Firstly, individuals are sorted by their fitness scores, and each individual is assigned a rank. If there are n individuals, then the best individual gets the rank n and the worst gets the rank 1. Then, based on this ranking, a probability is calculated as follows:

p_i = r_i / [n × (n - 1)],

where p_i denotes the probability of an individual i and r denotes its rank.

Two parents are selected based on the calculated probabilities.

Crossover methods

The goal of a crossover function is to create a new individual (an offspring) based on two parent individuals. There are several crossover methods available in the ga_crossover file. Currently, single-point crossover is used.

During crossover, there is also a possibility of mutation, specified in the genetic algorithm settings file. This can be disabled by setting the probability to 0.

k-point crossover

The k-point crossover method works through firstly encoding all values in a parent to a binary code and then randomly picking k points from the binary code. The bits in between the points are randomly swapped between parents to create an offspring. The default value for k is 1, meaning it is a single-point crossover.

Uniform crossover

The uniform crossover method works through firstly encoding all values in a parent to a binary code and then creating an offspring by choosing each bit from either parent with equal probability.

Group crossover

The group crossover method works by grouping together values in an individual that are said to be correlated to each other. The offspring is created by choosing each group from either parent with equal probability.

Mutation

For all crossover methods, there is a possibility of mutation.

In case of k-point or uniform crossover, the probability denoted in the settings means the probability for a single bit to mutate. A mutating bit is simply flipped in value.

In case of group crossover, the probability denoted in the settings means the probability for the entire group to mutate. The values belonging to the mutating group are shifted by some percentage. The entire group always mutates together. If the group is positively correlated, then the values are shifted in the same direction. If the group is negatively correlated, then the values are shifted in opposite directions.

The mutation probability value should be chosen according to the crossover method used.

Elitism

Elitism allows to preserve the best performing individuals from the previous generation when creating a new generation. It works by selecting the individuals with the highest fitness scores and adding them automatically to the next generation without having to go through the selection, crossover and mutation processes.

The usage of elitism can be customized through the genetic algorithm settings file, where either the number of individuals or the percentage of population to be preserved can be specified. To specify an absolute number of individuals to preserve, an integer value must be given. To specify an percentage of population to preserve, a floating-point value between 0 and 1 must be given. Elitism can be disabled by setting the value to be 0.

The goal of using elitism is to prevent the loss of well-performing individuals.

Culling

Culling allows to remove the worst performing individuals from the current generation prior to creating a new generation. It works by selecting the individuals with the lowest fitness scores and replacing them with new randomly generated individuals.

The usage of culling can be customized through the genetic algorithm settings file, where either the number of individuals or the percentage of population to be removed can be specified. To specify an absolute number of individuals to preserve, an integer value must be given. To specify an percentage of population to preserve, a floating-point value between 0 and 1 must be given. Culling can be disabled by setting the value to be 0.

The goal of using culling is to add diversity to the population on each iteration. However, using culling slows down the algorithm because for each replaced individual, a new fitness score must be calculated.

Binary encoding

The binary encoding used in k-point and uniform crossover methods work slightly differently for integer and floating-point values.

For integers, the value is simply converted into a 32-bit binary number in a string format.

For floats, firstly, the value is multiplied with ten to a chosen power. The value chosen as the power is named accuracy as it denotes how many decimals will be preserved after decoding. The default value is 5. After multiplication, the value is rounded and converted to an integer. Then, the same method as was used for integer values, can be used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly