Prisoner’s Dilemma - Simulation

Last time, each of you created a strategy for the prisoner’s dilemma, in this lesson we will experiment with your strategies. We will try to simulate the life of strategies, in a similar way Axelrod did.

Today, you will need your strategies from last time, you can find them in Teams in Files (in the Practicals channel).

Imagine, we have a population of individuals, who repeatedly play iterated prisoners dilemma against each other, and based on the number of points they receive, they are able to mate and produce offspring who survive to the next generation.

In such a simulation, we can observe some interesting phenomena. For example, is there a strategy that will survive in all cases? Is there a group of strategies that cannot be defeated by any other (new) strategy in the population (i.e. can you find a group of evolutionary stable strategies)?

Let us take, for example, a group of individuals who play the Always Cooperate (AC) strategy. What happens, when a new individual arrives to the population and plays the Always Defect (AD) strategy? All the AC individuals get 3 points for each round played against another AC individual, but the AD individuals get 5 points for each round against an AC individual. There are a lot of AC individuals and only one AD individual, thus the AD individual will get more points than the AC individuals and will create more offspring in the population and the number of AD individuals will grow. Will the number of AD individuals grow until they fill the whole population or is there a point, where it will stop? And what would happen, if instead of AC we had Tit-for-tat individuals in the initial population?

The simulation of the life of strategies is implemented in the simulation.py file (source codes are here - the same as last time). It simulates a population with a number of individuals. In the beginning, the number of individuals playing each strategy is given in a config file (it is config.json by default, but you can give a different one as a command line argument). In case the selected config does not exist, a default one, where each strategy has 10 individuals, is created. Afterwards, each individual in the population meets a number (10 by default) of other random individuals from the population and they play the iterated prisoners dilemma with each other. For each of these encounters both the individuals get some score. This score is then used to decide which individuals will play in the next generation - the individuals are selected randomly with repetition from the population and the probability of selecting an individual is proportional to its score (this is basically the roulette-wheel selection from evolutionary algorithms). In the end, the program shows (and saves) two plots - one shows the number of individuals in the population playing each given strategy and the other shows the average score of the individuals playing each strategy.

Ideas for Experiments

There is no homework (graded) assignment today, but let us experiment with the strategies from the tournament during the lesson.

  1. Simulate the population described above (population of AC/TFT with a few individuals playing AD, or vice versa).
  2. Find a group of strategies, where your strategy always survives to the end/always dies out.
  3. Can you change the outcome by changing the rewards for different C/D combinations? Can you change it even if the game is still a prisoners dilemma?
  4. Can you implement a strategy that can learn to react to the other strategies in the changing environment?