Prisoner’s dilemma: Why do MirrorPlayers win? The player simulation we implemented in this unit is called iterated priso
Posted: Fri May 20, 2022 12:39 pm
Prisoner’s dilemma: Why do MirrorPlayers win?
The player simulation we implemented in this unit is
called iterated prisoner’s dilemma. Here is a
short summary of the history behind this model and an explanation
why MirrorPlayer is often a winning strategy:
Payoffs in Prisoner’s
dilemma. Cooperate = be nice, betray = be nasty. When
playing one round only, betrayal gives a better payoff than
cooperation, regardless of the choice your opponent makes. However,
cooperative strategies win on average in iterated games, when
players encounter multiple times and meet a variety of
opponents.
New player tactics
In this test, we will implement three new player tactics and
test them in different game scenarios. We will see that generally
suboptimal strategies, different from Mean and Mirror, can be
advantageous is some of these situations.
RandomPlayer
Chooses nice or nasty at random (with equal probability).
NastyMirrorPlayer
The same as MirrorPlayer, but acts nasty (instead of nice) on
the first encounter with a player.
CountingPlayer
Before responding, it counts the total number of nice and nasty
encounters it had in the game so far (against all opponents), and
responds with the action they received most frequently. If the
number of nice and nasty encounters was equal, it should act nice
(so, for instance, their first encounter will be nice).
Task
In this task, we will write a program test10.py, which
implements three additional player tactics described above, and
tests their performance against a variety of opponents.
Step-by-step implementation
Implement the three new player types: RandomPlayer,
NastyMirrorPlayer, CountingPlayer
Implement a more compact reporting format. Instead of printing
out the entire population, your program should count the number of
players of each type in the population and report them as a single
row of numbers, e.g.
(if there are four types of players in your simulation and a
total of 40 players). Counting can be accomplished by iterating
through the population and checking the type of each player:
The resulting output may look as follows (each line is a new
generation):
In the example above, MirrorPlayer won after 4 generations. This
format gives a more succinct summary of the entire simulation,
compared to the original reporting style.
Testing scenarios
Once your code is ready, you can test how the new player types
perform in various game settings. We consider the following game
configurations:
Scenario A
Initial population: Friendly: 10, Mean:
10, Mirror: 10, Random: 10, NastyMirror: 10, Counting: 10. You will
have to test three variants of this scenario, setting the number of
encounters per generation
to 1000,8000,
and 32000, respectively. In all cases, run
the simulation for 20 generations.
Scenario B
Same as A, but remove all MeanPlayers.
Scenario C
Same as A, but remove all MeanPlayers and all
FriendlyPlayers.
Varying the initial population and the number of encounters per
generation, gives us a total of nine variants of the game, let’s
call them: A-1K, A-8K, A-32K, B-1K, B-8K, B-32K, C-1K, C-8K, and
C-32K for short.
Test each of these variants and determine the
player types that survive after 20 generations. Because
the game is intrinsically random, there could more than one
possible outcome. You will have to run the simulation more than
once to see all possible outcomes. Add
a comment at the beginning of your
program summarizing your findings (Note: To add a comment in
Python, type # anywhere in a line (except inside quotes). You can
learn more about comments in unit 11.):
The answer for the scenario A-1K is already given to you.
Complete the table.
In your table, you do not need to mention outcomes where other
player types are present after 20 generations but do not have the
most players remaining. The winner is simply the player type with
the most players remaining regardless of how many players of other
player types are left. You only need to mention outcomes where
either one player type has the most players left or there is a tie
among several player types. For example, suppose your program were
to output the following:
Notice that in the last row, Mirror had the most players
remaining with 20, but both Mean and Random still had some players
left. Here, the winner would be Mirror. You would not need to
mention Mean or Random in your table for this particular
scenario.
Optional
The following steps are optional. Unfortunately, there is no
extra credit if you do them. However, they could make the task of
summarizing your findings slightly easier.
As mentioned above, you will need to run the 20-generation
simulation more than once to see all possible outcomes. Instead of
running each simulation manually, modify your program so that it
automatically runs a 20-generation simulation a certain number of
times.
Instead of hard coding certain information in your program,
repeatedly prompt the user for the following:
Scenario
Number of encounters
Number of 20-generation runs (if you do optional step #1
above)
To hard-code information in a program is to specify values
directly in a program instead of obtaining the values from external
sources, such as prompting the user. Hard coding information that
can change is not considered good practice because it makes the
program less flexible.
A related concept is using magic numbers in your program, which
refers to using unique values directly in your program without any
explanation.
Example output (after doing all the optional steps)
Make sure you have a working version of your program that does
all the required steps before attempting any of the optional steps.
If you attempt the optional steps but encounter errors or other
unexpected behavior, submit the working version of your program
only instead of submitting the version with errors.
The player simulation we implemented in this unit is
called iterated prisoner’s dilemma. Here is a
short summary of the history behind this model and an explanation
why MirrorPlayer is often a winning strategy:
Payoffs in Prisoner’s
dilemma. Cooperate = be nice, betray = be nasty. When
playing one round only, betrayal gives a better payoff than
cooperation, regardless of the choice your opponent makes. However,
cooperative strategies win on average in iterated games, when
players encounter multiple times and meet a variety of
opponents.
New player tactics
In this test, we will implement three new player tactics and
test them in different game scenarios. We will see that generally
suboptimal strategies, different from Mean and Mirror, can be
advantageous is some of these situations.
RandomPlayer
Chooses nice or nasty at random (with equal probability).
NastyMirrorPlayer
The same as MirrorPlayer, but acts nasty (instead of nice) on
the first encounter with a player.
CountingPlayer
Before responding, it counts the total number of nice and nasty
encounters it had in the game so far (against all opponents), and
responds with the action they received most frequently. If the
number of nice and nasty encounters was equal, it should act nice
(so, for instance, their first encounter will be nice).
Task
In this task, we will write a program test10.py, which
implements three additional player tactics described above, and
tests their performance against a variety of opponents.
Step-by-step implementation
Implement the three new player types: RandomPlayer,
NastyMirrorPlayer, CountingPlayer
Implement a more compact reporting format. Instead of printing
out the entire population, your program should count the number of
players of each type in the population and report them as a single
row of numbers, e.g.
(if there are four types of players in your simulation and a
total of 40 players). Counting can be accomplished by iterating
through the population and checking the type of each player:
The resulting output may look as follows (each line is a new
generation):
In the example above, MirrorPlayer won after 4 generations. This
format gives a more succinct summary of the entire simulation,
compared to the original reporting style.
Testing scenarios
Once your code is ready, you can test how the new player types
perform in various game settings. We consider the following game
configurations:
Scenario A
Initial population: Friendly: 10, Mean:
10, Mirror: 10, Random: 10, NastyMirror: 10, Counting: 10. You will
have to test three variants of this scenario, setting the number of
encounters per generation
to 1000,8000,
and 32000, respectively. In all cases, run
the simulation for 20 generations.
Scenario B
Same as A, but remove all MeanPlayers.
Scenario C
Same as A, but remove all MeanPlayers and all
FriendlyPlayers.
Varying the initial population and the number of encounters per
generation, gives us a total of nine variants of the game, let’s
call them: A-1K, A-8K, A-32K, B-1K, B-8K, B-32K, C-1K, C-8K, and
C-32K for short.
Test each of these variants and determine the
player types that survive after 20 generations. Because
the game is intrinsically random, there could more than one
possible outcome. You will have to run the simulation more than
once to see all possible outcomes. Add
a comment at the beginning of your
program summarizing your findings (Note: To add a comment in
Python, type # anywhere in a line (except inside quotes). You can
learn more about comments in unit 11.):
The answer for the scenario A-1K is already given to you.
Complete the table.
In your table, you do not need to mention outcomes where other
player types are present after 20 generations but do not have the
most players remaining. The winner is simply the player type with
the most players remaining regardless of how many players of other
player types are left. You only need to mention outcomes where
either one player type has the most players left or there is a tie
among several player types. For example, suppose your program were
to output the following:
Notice that in the last row, Mirror had the most players
remaining with 20, but both Mean and Random still had some players
left. Here, the winner would be Mirror. You would not need to
mention Mean or Random in your table for this particular
scenario.
Optional
The following steps are optional. Unfortunately, there is no
extra credit if you do them. However, they could make the task of
summarizing your findings slightly easier.
As mentioned above, you will need to run the 20-generation
simulation more than once to see all possible outcomes. Instead of
running each simulation manually, modify your program so that it
automatically runs a 20-generation simulation a certain number of
times.
Instead of hard coding certain information in your program,
repeatedly prompt the user for the following:
Scenario
Number of encounters
Number of 20-generation runs (if you do optional step #1
above)
To hard-code information in a program is to specify values
directly in a program instead of obtaining the values from external
sources, such as prompting the user. Hard coding information that
can change is not considered good practice because it makes the
program less flexible.
A related concept is using magic numbers in your program, which
refers to using unique values directly in your program without any
explanation.
Example output (after doing all the optional steps)
Make sure you have a working version of your program that does
all the required steps before attempting any of the optional steps.
If you attempt the optional steps but encounter errors or other
unexpected behavior, submit the working version of your program
only instead of submitting the version with errors.