AbstractIntroduction
Method
ResultsConclusionsFuture workSummaryReferencesAppendix |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
IntroductionThe goal of the project was to evolve good strategies, the pursuing flock should be good at catching evaders, and the evading flock should be good at evading pursuers. These strategies were described in the form of programs that took the state of the game as input, and then supplied instructions on how to move. The members of the same flock all use copies of the same program. They can not communicate directly, but our theory was that co-operative behaviour would eventually appear because every member in the same flock use the same program for deciding what to do.The gameIt's a variation of the classical "Pursuer and evader" game, with the alteration that we have flocks of pursuers and flocks of evaders. The flock is the central object, and the rules are designed so that the individuals care more for the wellbeeing of the flock than for themselves. Self sacrifice could at times be beneficial for the flock.Evaders are caught when touched by a pursuer. Put your mouse pointer in the applet below to see how the game can look when there is only one pursuer. The game continues until a given period of time has elapsed or all evaders have been caught. The physicsThe world is populated with agents that interact by touching. The agents each have a position, radius, speed and maximum acceleration. They are also subject to a force of friction which implies a maximum speed. The agents receives information by vectors pointing to the objects surrounding them. There are also a number of different obstacles:
Genetic ProgrammingTo evolve the strategies we use Evolutionary Algorithms, Genetic Programming to be precise. It is a scheme that develops programs using evolutionary principles. Initially a population of random programs are created. After that a loop of evaluation, selection and reproduction is repeated until some termination criteria is met. Evaluation means that the performance of the programs are measured. Selection is the process of selecting individuals that will become parents of new individuals and choosing individuals to discard. This selection is based on the result of the evaluation step. Bad individuals will be discarded and good ones will stay and become parents. The idea is that better individuals will through their children step by step multiply their numbers in the population.MethodThe game simulatorTo evolve and evaluate our strategies we designed an engine to simulate our physics. We chose the Java language for its advanced object oriented programming features and to facilitate making an illustrative presentation of the evolved strategies in this report as java applets.The physicsOur goal was that the rule of our world should be simple and relatively fast to run. But we still wanted the physics to allow complex behavour. Finally we came up with a rule system for the physics of our world that we were content with. In our model time is not continuos, it is discrete. There are two types of objects, players and immobile objects. The immobile objects are boulders and walls. The players are the evaders and pursuers.When a player collides with an immobile object, like a wall, the player is moved out of the object, and the speed normal to the object is set to zero. When two players collide, they are moved apart, and the speed normal to the other is set to zero. The GP systemThe virtual machineTo run the programs that we are evolving our system includes a virtual machine. This virtual machine run the programs. It is a register machine with linear representation of the programs.Data typesWe have three datatypes: Real, Vector and Boolean. The vectors are a natural datatype considering the strategies we want to evolve, the strategy has a lot to do with directions and distances.Instruction setThe instructions of the virtual machine is machine code like. The complete instruction can be seen in table 1, where instructions are ordered by column according to return type.
All of the instructions work on the memory of the virtual machine. There is also an rts instruction i.e. a return from subroutine which ends the program. MemoryThe memory consists of an array of variables that can hold values. There are four kinds of memory positions:
When the programs are called Outputs and Registers are set to zero. The Inputs are set to the inputs to the players. The constants are always set to some preset values. In our runs with two teams and two players when we use two Registers and 4 constants, the memory for each player ooks like this: Vectors
Reals
Booleans
InitializationThe populations are initialized with a number of individuals of the same length with random instructions. The instructions are chosen with equal probability. The source in the instructions is chosen randomly between all of the memory positions, and the destination is chosen randomly between the Output and Register memory positions. After a while we started seeding the population, to speed up the evolution. We seed the population by setting the last instruction. For the evaders this last instruction is acceleration.v_sub(pursuerpos[0]), making all evaders evade the pursuer closest to them. The last instruction for the pursuers is acceleration.v_add(evaderpos[0]), making them go for the closest evader.PopulationThe population size is set at the beginning of the evolution. They have a maximum size, but are all initialized to the same length.Genetic operatorsThe system chooses type of reproduction randomly. With a set chance two point cross over is used, and the rest of the times only one of the parents are replicated.Mutation occurs with another probability. Each time a child is produced there is a possibility of several instructions being mutated. All of the lines of the program have an equal chance of being selected for mutation. We define the mutation frequency as the probablility that one or more lines of code will be replaced by a random instruction. Fitness calculationHow good a program will perform depends on the situation, for instance what the positions of the evaders and pursuers. We have introduced an "evader fitness primitive" and a "pursuer fitness primitive". When these values are low that is good for that flock. To evaluate the programs we simulate them a couple of timesteps. The fitness of a tested program is the primitive at the end minus the primitive at the start. This means that the lower the fitness is the better the individual.The primitives are defined like this: The "evader fitness primitive" is the number of evaders alive times -1000000 minus the sum of the square of the distance to the closest pursuer of each evader. The "pursuer fitness primitive" is the number of evaders alive times 1000000 plus the sum of the square of the distance to the closest evader of each pursuer. The fitness will make the evaders want to keep their distance to the pursuers and they will strive not to get eaten. In turn the pursuers will want to get closer to the evaders and try to eat them. We use the "30 monkeys in a bus" method, the tournaments are run a few time steps. SelectionFirst we tried coevolution, we randomly choose two pairs of individuals from each population. For each pair the evaders competed against the pursuers, the evader having the best total fitness is proclaimed the winner. At the same time we have evaluated the pursuers, and the pursuer with the best total fitness is proclaimed the winner. The same procedure is repeated in the other pair, and we have two evader winners and two pursuer winners. We also have two losers of each kind, and these are replaced with children of the winners.We define a generation as the population size divided by the number of individuals competing every selection. This is the number of selections needed to renew the whole population if we had not put the selected individuals back in the population after a tournament round. ResultsWe have run our program several times with different parameters. Here we will present those which we think are the most interesting. In the majority of our runs we used populations of 128 individuals. During evolution one game consisted of 4 pursuers and 12 evaders controlled by two different programs. In the first step we used coevolution in evolving our pursuers and evaders. This gave us difficulties with interpreting the results since a succesful individual may be bad at competing with an individual from a different generation than his own. When evaluating the mean fitness of evolving pursuers against evolving evaders the graph showed a random behaviour of fitness over time. When evaluating the mean fitness for the population of pursuers against a population of evaders which are simply standing and still and not evolving, we found the fitness as seen in figure 1. We produced this plot simply to be sure of that the pursuers really were evolving. When calculating the fitness we did this only for the individuals who were actually competing that means were selected for evolution. This means that the fitness is a mean fitness of the competing individuals and not a mean fitness of the whole population. Therefor our mean fitness may be a bit more sporadic than a mean fitness for an entire population would have been. As one can see in figure 1 the fitness improves, comes closer to zero, after a few generations. Our pursuers were therefore evolving and getting better as hunters. Unfortunately they were not developing advanced hunting strategies so the hunters were mostly heading straight towards the prey closest to them even if all other hunters were heading for the same prey. The peak in the graph shows how sometimes the fitness will grow worse during evolution, we believe this is because sometimes the positions in the game will prevent evolution. Since it is difficult to visualize the progress individuals are making in coevolution we now tried evolution without coevolution. The evolving evaders were now hunted by a nonevolving pursuer who's hunters relentlessly headed towards the nearest prey. Likewise the evolving pursuers were hunting nonevolving evaders who always headed in the opposite direciton of the nearest hunter. Figure 2: High mutation and crossover frequencies When we lowered the mutation frequency and crossover frequency we got an improved fitness as seen in figure 3. Still the fitness is very erratic though. Figure 3: Lower mutation- and crossover- frequencies. Lowering the mutation frequency and crossover frequency even further, the fitness curve now showed a more satisfactory behaviour. As one can see in figure 4 the fitness is not as erratic as before and it improves with time. The mutation frequency is now 0.3 and the crossover frequency is 0.2. After 300 generations the fitness increases rapidly, but the increase only lasts approximately until 500 generations have evolved. Then the sudden increase and thus worsening of the fitness dissapears. We assume the temporary increase of the fitness is due to chance. Figure 4: Very low mutation- and crossover- frequencies. The fitness for the pursuers show the same behaviour as for the evaders that is that the fitness improves with time. This can be seen in figure 5. The fitness still behaves erratically but to a lesser degree than before and it keeps closer to its optimum value. Figure 5: Very low mutation- and crossover- frequencies. We now present some of the evolved individuals as they are seen in the game. The first applet shows an evader after 1000 generations and a pursuer after 200 generations. The links labeled "evader" and "pursuer" are to the text output from the system, showing the programs and some parameters. The pursuer shows the typical behaviour of most of our evolved pursuers. It simply attacks the closest evader. The evader on the other hand has a bit more complex behaviour.
Conclusions and DiscussionSince coevolution did not yield especially good evaders and therefore not very interesting pursuers we also did some runs without coevolution. This way we hoped to evolve flockbehaviour where the evaders were concerned and intelligent algorithms for hunting in teams for the pursuers and in part we got what we wanted. When lowering the mutation frequency and crossover frequency we received better results. The mean fitness for the population did not show as random as behaviour as it did before. This we assume is because of good individuals not being damaged so easily. We found that many of the things that our evolved programs did were rather difficult to comprehend the purpose of. The game is simply too difficult to predict as the individuals were evolving and becoming complex.Future workThe system can handle three dimensions, but we have had some problems showing the results from this in Java, so the first extension would be to complete the work with making the applet present the model in three dimension. It would be interesting to make the system run in parallell on several computers, making more time consuming runs possible. Seeding the populations with boid-like behaviour and see if these genes survives would show if they were good at evading or pursuing. Just trying to adjust the parameters and make a large series of runs and analyse the results would probably show many results which we have not had the time to discover.SummaryWe have worked long and hard constructing our virtual machine and also in visualizing the predators and their prey. Where the virtual machine is concerned we wanted to design it for flexibility. The engine even supports evolution in three spacial dimensions thanks to our vector datatype. Unfortunately our time was limited and we have thus not been able to analyze our results as much as we would have liked. Also we did not have time enough to adjust the parameters in the model to produce optimal results.References[1] Genetic Programming, An IntroductionWolfgang Banzhaf, Peter Nordin, Robert E. Keller, Frank D. Francone Morgan Kaufmann Publishers Inc., San Francisco, California, USA dpunkt.verlag, Heidelberg, Germany ISBN: 155860510X [2] An Introduction To Genetic Algorithms
[3] Boids, background and update
AppendixSource code in Java |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Simulation of the two authors hunting down bugs in the software. Oscar Lindberg: f96osli@dd.chalmers
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In cooperation with |