Abstract

Introduction

The game
The physics

Method

The game simulator
The GP system

Results

Conclusions

Future work

Summary

References

Appendix

SaFaRi - co-evolution of flocks of pursuers and evaders SaFaRi (tm)

- Co-evolution of flocks of pursuers and evaders

Johan Kristensson 780217-4974
Oscar Lindberg 771128-4815
 


Move mouse into the applet to run it.
Press the left mousebutton to restart.

Abstract

We have co-evolved strategies for flocks of pursuers and flocks of evaders. The flocks consist of identials flockmembers, i.e. members of the same flock use the same strategy when making decisions. For evolving the strategies we used genetic programming, with vectors as the primary datatype.

The results were not extraordinary, but some strategies were evolved. Some strategies show flock behaviour, but not advanced. These results are due to the time consuming runs of simulations, which made finetuning of parameters nearly impossible.

Introduction

The goal of the project was to evolve good strategies, the pursuing flock should be good at catching evaders, and the evading flock should be good at evading pursuers. These strategies were described in the form of programs that took the state of the game as input, and then supplied instructions on how to move. The members of the same flock all use copies of the same program. They can not communicate directly, but our theory was that co-operative behaviour would eventually appear because every member in the same flock use the same program for deciding what to do. 

The game

It's a variation of the classical "Pursuer and evader" game, with the alteration that we have flocks of pursuers and flocks of evaders. The flock is the central object, and the rules are designed so that the individuals care more for the wellbeeing of the flock than for themselves. Self sacrifice could at times be beneficial for the flock.

Evaders are caught when touched by a pursuer. Put your mouse pointer in the applet below to see how the game can look when there is only one pursuer.

The game continues until a given period of time has elapsed or all evaders have been caught.

The physics

The world is populated with agents that interact by touching. The agents each have a position, radius, speed and maximum acceleration. They are also subject to a force of friction which implies a maximum speed. The agents receives information by vectors pointing to the objects surrounding them. There are also a number of different obstacles:
 


Boulders


Vertical walls


Arena


Box, horizontal and vertical walls

Genetic Programming

To evolve the strategies we use Evolutionary Algorithms, Genetic Programming to be precise. It is a scheme that develops programs using evolutionary principles. Initially a population of random programs are created. After that a loop of evaluation, selection and reproduction is repeated until some termination criteria is met. Evaluation means that the performance of the programs are measured. Selection is the process of selecting individuals that will become parents of new individuals and choosing individuals to discard. This selection is based on the result of the evaluation step. Bad individuals will be discarded and good ones will stay and become parents. The idea is that better individuals will through their children step by step multiply their numbers in the population.

Method

The game simulator

To evolve and evaluate our strategies we designed an engine to simulate our physics. We chose the Java language for its advanced object oriented programming features and to facilitate making an illustrative presentation of the evolved strategies in this report as java applets.

The physics

Our goal was that the rule of our world should be simple and relatively fast to run. But we still wanted the physics to allow complex behavour. Finally we came up with a rule system for the physics of our world that we were content with. In our model time is not continuos, it is discrete. There are two types of objects, players and immobile objects. The immobile objects are boulders and walls. The players are the evaders and pursuers. 

When a player collides with an immobile object, like a wall, the player is moved out of the object, and the speed normal to the object is set to zero. 

When two players collide, they are moved apart, and the speed normal to the other is set to zero. 

The GP system

The virtual machine

To run the programs that we are evolving our system includes a virtual machine. This virtual machine run the programs. It is a register machine with linear representation of the programs.
Data types
We have three datatypes:  Real, Vector and Boolean. The vectors are a natural datatype considering the strategies we want to evolve, the strategy has a lot to do with directions and distances. 
Instruction set
The instructions of the virtual machine is machine code like. The complete instruction can be seen in table 1, where instructions are ordered by column according to return type.
 
Vector Real Boolean
v_add
v_sub
v_mul
v_div
v_norm
v_condmove
v_scalarproduct
v_abs
d_add
d_sub
d_mul
d_div
d_abs
d_condmove
v_longer
d_greater
v_shorter
d_less
d_equal
v_equal
b_and
b_or
b_xor
b_not

All of the instructions work on the memory of the virtual machine. There is also an rts instruction i.e. a return from subroutine which ends the program.

Memory
The memory consists of an array of variables that can hold values. There are four kinds of memory positions:
  • Outputs
  • Registers
  • Inputs
  • Constants
The two first, Outputs and Registers, can both be read and written. The two last, Inputs and Constants can only be read. This means that all of the memory positions can be sources of the instruction, but only Outputs and Registers can be destinations.

When the programs are called Outputs and Registers are set to zero. The Inputs are set to the inputs to the players. The constants are always set to some preset values.

In our runs with two teams and two players when we use two Registers and 4 constants, the memory for each player ooks like this:

Vectors
Name Description
acceleration The acceleration the strategy wants to set. When the program has stopped this is the output.
vektormem[] The vector memories. Will be kept until next timestep. Used for calculations.
vektorreg[] The vector registers. Used for calculations.
velocity Tells the agents its own velocity. Given as input.
evaderpos[] The positions of the closest evaders. Input.
evadervel[] The velocities of the closest evaders. Input.
pursuerpos[] The position of the closest pursuers. Input.
pursuervel[] The velocities of the closest pursuers. Input.
borderpos[] The positions of the closest walls and boulders. Input.

Reals
Name Description
realmem[] The real memories. Will be kept until next timestep. Used for calculations.
realreg[] The real registers. Used for calculations.
const: 0.0 The value 0.0. Constant.
const: 1.0 The value 1.0. Constant.

Booleans
Name Description
boolmem[] The boolean memories. Will be kept until next timestep. Used for calculations.
boolreg[] The boolean registers. Used for calculations.
const: true The value "true". Constant.
const: false The value "false". Constant.

Initialization

The populations are initialized with a number of individuals of the same length with random instructions. The instructions are chosen with equal probability. The source in the instructions is chosen randomly between all of the memory positions, and the destination is chosen randomly between the Output and Register memory positions. After a while we started seeding the population, to speed up the evolution. We seed the population by setting the last instruction. For the evaders this last instruction is acceleration.v_sub(pursuerpos[0]), making all evaders evade the pursuer closest to them. The last instruction for the pursuers is acceleration.v_add(evaderpos[0]), making them go for the closest evader.

Population

The population size is set at the beginning of the evolution. They have a maximum size, but are all initialized to the same length.

Genetic operators

The system chooses type of reproduction randomly. With a set chance two point cross over is used, and the rest of the times only one of the parents are replicated.

Mutation occurs with another probability. Each time a child is produced there is a possibility of several instructions being mutated. All of the lines of the program have an equal chance of being selected for mutation. We define the mutation frequency as the probablility that one or more lines of code will be replaced by a random instruction.

Fitness calculation

How good a program will perform depends on the situation, for instance what the positions of the evaders and pursuers. We have introduced an "evader fitness primitive" and a "pursuer fitness primitive". When these values are low that is good for that flock. To evaluate the programs we simulate them a couple of timesteps. The fitness of a tested program is the primitive at the end minus the primitive at the start. This means that the lower the fitness is the better the individual.

The primitives are defined like this: The "evader fitness primitive" is the number of evaders alive times -1000000 minus the sum of the square of the distance to the closest pursuer of each evader. The "pursuer fitness primitive" is the number of evaders alive times 1000000 plus the sum of the square of the distance to the closest evader of each pursuer.

The fitness will make the evaders want to keep their distance to the pursuers and they will strive not to get eaten. In turn the pursuers will want to get closer to the evaders and try to eat them.

We use the "30 monkeys in a bus" method, the tournaments are run a few time steps.

Selection

First we tried coevolution, we randomly choose two pairs of individuals from each population. For each pair the evaders competed against the pursuers, the evader having the best total fitness is proclaimed the winner. At the same time we have evaluated the pursuers, and the pursuer with the best total fitness is proclaimed the winner. The same procedure is repeated in the other pair, and we have two evader winners and two pursuer winners. We also have two losers of each kind, and these are replaced with children of the winners.

We define a generation as the population size divided by the number of individuals competing every selection. This is the number of selections needed to renew the whole population if we had not put the selected individuals back in the population after a tournament round.

Results


We have run our program several times with different parameters. Here we will present those which we think are the most interesting.
In the majority of our runs we used populations of 128 individuals. During evolution one game consisted of 4 pursuers and 12 evaders controlled by two different programs. In the first step we used coevolution in evolving our pursuers and evaders. This gave us difficulties with interpreting the results since a succesful individual may be bad at competing with an individual from a different generation than his own. When evaluating the mean fitness of evolving pursuers against evolving evaders the graph showed a random behaviour of fitness over time.
When evaluating the mean fitness for the population of pursuers against a population of evaders which are simply standing and still and not evolving, we found the fitness as seen in figure 1. We produced this plot simply to be sure of that the pursuers really were evolving. When calculating the fitness we did this only for the individuals who were actually competing that means were selected for evolution. This means that the fitness is a mean fitness of the competing individuals and not a mean fitness of the whole population. Therefor our mean fitness may be a bit more sporadic than a mean fitness for an entire population would have been.
As one can see in figure 1 the fitness improves, comes closer to zero, after a few generations. Our pursuers were therefore evolving and getting better as hunters. Unfortunately they were not developing advanced hunting strategies so the hunters were mostly heading straight towards the prey closest to them even if all other hunters were heading for the same prey. The peak in the graph shows how sometimes the fitness will grow worse during evolution, we believe this is because sometimes the positions in the game will prevent evolution.

Figure 1: A stationary evader and an evolving pursuer

Since it is difficult to visualize the progress individuals are making in coevolution we now tried evolution without coevolution. The evolving evaders were now hunted by a nonevolving pursuer who's hunters relentlessly headed towards the nearest prey. Likewise the evolving pursuers were hunting nonevolving evaders who always headed in the opposite direciton of the nearest hunter.


Figure 2: High mutation and crossover frequencies

When we lowered the mutation frequency and crossover frequency we got an improved fitness as seen in figure 3. Still the fitness is very erratic though.


Figure 3: Lower mutation- and crossover- frequencies.

Lowering the mutation frequency and crossover frequency even further, the fitness curve now showed a more satisfactory behaviour. As one can see in figure 4 the fitness is not as erratic as before and it improves with time. The mutation frequency is now 0.3 and the crossover frequency is 0.2. After 300 generations the fitness increases rapidly, but the increase only lasts approximately until 500 generations have evolved. Then the sudden increase and thus worsening of the fitness dissapears. We assume the temporary increase of the fitness is due to chance.


Figure 4: Very low mutation- and crossover- frequencies.

The fitness for the pursuers show the same behaviour as for the evaders that is that the fitness improves with time. This can be seen in figure 5. The fitness still behaves erratically but to a lesser degree than before and it keeps closer to its optimum value.


Figure 5: Very low mutation- and crossover- frequencies.

We now present some of the evolved individuals as they are seen in the game. The first applet shows an evader after 1000 generations and a pursuer after 200 generations. The links labeled "evader" and "pursuer" are to the text output from the system, showing the programs and some parameters. The pursuer shows the typical behaviour of most of our evolved pursuers. It simply attacks the closest evader. The evader on the other hand has a bit more complex behaviour.


Evader
This evader tries to move nearer to its third closest evading neighbour. It also tries to move away from the nearest and third nearest pursuer.
Pursuer
This pursuer simply accelerates in the direction of the nearest evader.

Evader

This evader tries to move closer to its nearest and third nearest evading neighbour as well as matching their speed and direction. It also tries to move away from the two closest pursuers. Strangely enough it also tries to move closer to the nearest stationary object such as a boulder or a wall.
Pursuer

This pursuer moves towards the nearest evader. Unluckily for him he also tries to accelerate in the opposite direction of which he is heading. He also tries to avoid hitting fixed objects, this could be a bad idea if prey is close to a fixed object since the hunter might not be able to get close to the object.

 

Conclusions and Discussion

Since coevolution did not yield especially good evaders and therefore not very interesting pursuers we also did some runs without coevolution. This way we hoped to evolve flockbehaviour where the evaders were concerned and intelligent algorithms for hunting in teams for the pursuers and in part we got what we wanted. When lowering the mutation frequency and crossover frequency we received better results. The mean fitness for the population did not show as random as behaviour as it did before. This we assume is because of good individuals not being damaged so easily. We found that many of the things that our evolved programs did were rather difficult to comprehend the purpose of. The game is simply too difficult to predict as the individuals were evolving and becoming complex.

Future work

The system can handle three dimensions, but we have had some problems showing the results from this in Java, so the first extension would be to complete the work with making the applet present the model in three dimension.

It would be interesting to make the system run in parallell on several computers, making more time consuming runs possible.

Seeding the populations with boid-like behaviour and see if these genes survives would show if they were good at evading or pursuing.

Just trying to adjust the parameters and make a large series of runs and analyse the results would probably show many results which we have not had the time to discover.

Summary

We have worked long and hard constructing our virtual machine and also in visualizing the predators and their prey. Where the virtual machine is concerned we wanted to design it for flexibility. The engine even supports evolution in three spacial dimensions thanks to our vector datatype. Unfortunately our time was limited and we have thus not been able to analyze our results as much as we would have liked. Also we did not have time enough to adjust the parameters in the model to produce optimal results.

References

[1] Genetic Programming, An Introduction
Wolfgang Banzhaf, Peter Nordin, Robert E. Keller, Frank D. Francone
Morgan Kaufmann Publishers Inc., San Francisco, California, USA
dpunkt.verlag, Heidelberg, Germany
ISBN: 155860510X 

[2] An Introduction To Genetic Algorithms
Melanie Mitchell
MIT Press, Cambridge, Massachusetts, USA

[3] Boids, background and update
Craig Reynolds
Boids homepage

Appendix

Source code in Java

Simulation of the two authors hunting down bugs in the software.

Oscar Lindberg: f96osli@dd.chalmers
Johan Kristensson: f96jokr@kristenson.se
Mon, 10 Dec 2001

In cooperation with

Bacon Medialab

(fram)tids(fab)riken