The example is integrated into LightBulbExample and its code can be found in PongEvolution.
A simple example how to use coevolution in LightBulb. Our goal is to learn a network to play pong.
As pong is a game for two players and we do not want to add any human or predefined AI player to this example - the AI should learn totally on its own - it is necessary that AIs play against each other and learn by its own mistakes/good moves. This can be a bit complicated, so we use coevolution which makes it easier to get this working.
Coevolution is very similar to normal evolution, but instead of having just one population we use two. To rate individuals from a population we let them play pong against individuals from the other populations and select the ones which did the best job. One population is called the parasite population. The goal for the individuals of this population is to make the individuals from the other population fail. The goal of the ones from the other population is to win or at least play tie.
At first we need again our individual class:
class PongAI : public LightBulb::AbstractDefaultIndividual
{
protected:
Pong* currentGame;
void getNNInput(std::vector<double>& input) override;
void interpretNNOutput(std::vector<double>& output) override;
public:
PongAI(Pong& pong_);
void setPong(Pong& pong);
};
void PongAI::setPong(Pong& pong)
{
currentGame = &pong;
}The constructor:
PongAI::PongAI(Pong& pong_)
: AbstractDefaultIndividual(pong_)
{
currentGame = &pong_;
FeedForwardNetworkTopologyOptions options;
options.enableShortcuts = true;
options.neuronsPerLayerCount.push_back(6);
options.neuronsPerLayerCount.push_back(10);
options.neuronsPerLayerCount.push_back(2);
options.descriptionFactory = new SameNeuronDescriptionFactory(new NeuronDescription(new WeightedSumFunction(), new BinaryFunction()));
buildNeuralNetwork(options);
}We get the network input from the environment:
void PongAI::getNNInput(std::vector<double>& input)
{
currentGame->getNNInput(input);
}The interpretation of the output is really simple. Either move the paddle up, down or do nothing:
void PongAI::interpretNNOutput(std::vector<double>& output)
{
if (output[0] > 0.5)
currentGame->getGame().movePaddle(1);
else if (output[1] > 0.5)
currentGame->getGame().movePaddle(-1);
}In this example we have to use the AbstractCoevolutionEnvironment as we want to use coevolution.
class Pong : public LightBulb::AbstractCoevolutionEnvironment
{
private:
PongGame game;
protected:
LightBulb::AbstractIndividual* createNewIndividual() override;
void resetEnvironment() override;
int simulateGame(PongAI& ai1, PongAI& ai2);
void setRandomGenerator(AbstractRandomGenerator& randomGenerator_);
int doCompare(LightBulb::AbstractIndividual& obj1, LightBulb::AbstractIndividual& obj2, int round) override;
public:
Pong(bool isParasiteEnvironment, LightBulb::AbstractCombiningStrategy* combiningStrategy_, LightBulb::AbstractCoevolutionFitnessFunction* fitnessFunction_, LightBulb::AbstractHallOfFameAlgorithm* hallOfFameToAddAlgorithm_ = nullptr, LightBulb::AbstractHallOfFameAlgorithm* hallOfFameToChallengeAlgorithm_ = nullptr);
void getNNInput(std::vector<double>& sight);
int getRoundCount() const override;
PongGame& getGame();
};
PongGame& Pong::getGame()
{
return game;
}
void Pong::setRandomGenerator(AbstractRandomGenerator& randomGenerator_)
{
AbstractRandomGeneratorUser::setRandomGenerator(randomGenerator_);
game.setRandomGenerator(randomGenerator_);
}In the constructor we just forward the given parameters to our base class. We will come to their meaning later.
Pong::Pong(bool isParasiteEnvironment_, AbstractCombiningStrategy* combiningStrategy_, AbstractCoevolutionFitnessFunction* fitnessFunction_, AbstractHallOfFameAlgorithm* hallOfFameToAddAlgorithm_, AbstractHallOfFameAlgorithm* hallOfFameToChallengeAlgorithm_)
: AbstractCoevolutionEnvironment(isParasiteEnvironment_, combiningStrategy_, fitnessFunction_, hallOfFameToAddAlgorithm_, hallOfFameToChallengeAlgorithm_)
{
}As in a normal environment we have to define how new individuals should be created:
AbstractIndividual* Pong::createNewIndividual()
{
return new PongAI(*this);
}When the environment resets we just reset our game. PongGame is by the way just a class which manages the pong game itself. To make the example still clear and simple, i have excluded the game code into an extra class.
void Pong::resetEnvironment()
{
game.reset();
}As said in the individual class, here is the getNNInput() method. The neural network gets ball position, velocity and paddle positions:
void Pong::getNNInput(std::vector<double>& input)
{
input.resize(6);
input[0] = game.getPlayer() * game.getState().ballPosX / game.getProperties().width;
input[1] = game.getState().ballPosY / game.getProperties().height;
input[2] = game.getPlayer() * game.getState().ballVelX / game.getProperties().maxBallSpeed;
input[3] = game.getState().ballVelY / game.getProperties().maxBallSpeed;
if (game.getPlayer() == 1)
{
input[4] = game.getState().paddle1Pos / (game.getProperties().height - game.getProperties().paddleHeight);
input[5] = game.getState().paddle2Pos / (game.getProperties().height - game.getProperties().paddleHeight);
}
else
{
input[5] = game.getState().paddle1Pos / (game.getProperties().height - game.getProperties().paddleHeight);
input[4] = game.getState().paddle2Pos / (game.getProperties().height - game.getProperties().paddleHeight);
}
}We also have to define how often each pair of individuals should be compared. In our case this is just 1, but it can also be 2 or more times. (For example, if the starting player has an advantage)
int Pong::getRoundCount() const
{
return 1;
}Now let's get to the interesting part: We have to define how individuals should be compared.
int Pong::doCompare(AbstractIndividual& individual1, AbstractIndividual& individual2, int round)
{
return simulateGame(static_cast<PongAI&>(individual1), static_cast<PongAI&>(individual2));
}The method doCompare should just return 1, if individual1 has been better than individual2 or -1, if it's the other way round. To make this decision we just simulate one game between those AIs.
int Pong::simulateGame(PongAI& ai1, PongAI& ai2)
{
ai2.resetNN();
ai1.resetNN();
ai1.setPong(*this);
ai2.setPong(*this);
resetEnvironment();
double time = 0;
while (game.whoHasWon() == 0 && time < game.getProperties().maxTime)
{
game.setPlayer(1);
ai1.doNNCalculation();
game.setPlayer(-1);
ai2.doNNCalculation();
game.advanceBall(1);
time++;
}
if (game.whoHasWon() == 0) {
if (parasiteEnvironment)
return -1;
else
return 1;
}
else
return game.whoHasWon();
}The game method "whoHasWon" returns -1, 0, 1 depending on who has won or if it's a tie. But as we are not allowed to return 0 (tie) in doCompare(), we always have to determine a winner. Remember: The parasite individual has to make the other individual fail. If it's a tie, this has not happened. So we declare the other individual as a winner.
First we have to create our two environments:
SharedSamplingCombiningStrategy* cs1 = new SharedSamplingCombiningStrategy(25);
SharedSamplingCombiningStrategy* cs2 = new SharedSamplingCombiningStrategy(25);
RandomHallOfFameAlgorithm* hof1 = new RandomHallOfFameAlgorithm(5);
RandomHallOfFameAlgorithm* hof2 = new RandomHallOfFameAlgorithm(5);
Pong environment(false, cs1, new SharedCoevolutionFitnessFunction(), hof1, hof2);
Pong parasiteEnvironment(true, cs2, new SharedCoevolutionFitnessFunction(), hof2, hof1);
cs1->setSecondEnvironment(parasiteEnvironment);
cs2->setSecondEnvironment(environment);Both environments have its combining strategies and hall of fame algorithms.
The combining strategy determine which individuals should be compared. As it would be too expensive to compare every individual with every other one, it is very important to get the best comparison pairs.
The hall of fame algorithms might be not necessary for simple problems like this one. Every few iterations they store the best individual and compare future individuals with those hall of fame members. This makes the learning process more stable.
EvolutionLearningRuleOptions options;
options.creationCommands.push_back(new ConstantCreationCommand(250));
options.exitConditions.push_back(new PerfectIndividualFoundCondition(20));
options.reuseCommands.push_back(new ConstantReuseCommand(new BestReuseSelector(), 16));
options.selectionCommands.push_back(new BestSelectionCommand(150));
options.mutationsCommands.push_back(new ConstantMutationCommand(new MutationAlgorithm(1.6), new RandomSelector(new RankBasedRandomFunction()), 1.8));
options.recombinationCommands.push_back(new ConstantRecombinationCommand(new RecombinationAlgorithm(), new RandomSelector(new RankBasedRandomFunction()), 0.3));
options.environment = &environment;
EvolutionLearningRule learningRule1(options);
options.environment = ¶siteEnvironment;
EvolutionLearningRule learningRule2(options);For creating the two evolution learning rules which manages the two environments, we again have to declare a bunch of evolution commands.
Those learning rules are now combined into one big CoevolutionLearningRule:
CoevolutionLearningRuleOptions coevolutionLearningRuleOptions;
coevolutionLearningRuleOptions.learningRule1 = &learningRule1;
coevolutionLearningRuleOptions.learningRule2 = &learningRule2;
coevolutionLearningRuleOptions.logger = new ConsoleLogger(LL_LOW);
CoevolutionLearningRule learningRule(coevolutionLearningRuleOptions);Now we are ready to start!
std::unique_ptr<EvolutionLearningResult> result(static_cast<EvolutionLearningResult*>(learningRule.start()));
PongAI* bestAI = static_cast<PongAI*>(result->bestIndividuals[0].get());Now its up to you to try and test the pong AI.
Of course you could also just click the gif below and try it online. 😜
In the next tutorial we also try to learn pong, but this time we use reinforcement learning.
The whole code:
#include <LightBulb/NetworkTopology/FeedForwardNetworkTopology.hpp>
#include <LightBulb/NeuronDescription/SameNeuronDescriptionFactory.hpp>
#include <LightBulb/NeuronDescription/NeuronDescription.hpp>
#include <LightBulb/NeuralNetwork/NeuralNetwork.hpp>
#include <LightBulb/Learning/Supervised/GradientDescentLearningRule.hpp>
#include <LightBulb/Function/ActivationFunction/BinaryFunction.hpp>
#include <LightBulb/Function/InputFunction/WeightedSumFunction.hpp>
#include <LightBulb/Learning/Evolution/AbstractDefaultIndividual.hpp>
#include <LightBulb/Learning/Evolution/AbstractEvolutionLearningRule.hpp>
#include <LightBulb/Function/RandomFunction/RankBasedRandomFunction.hpp>
#include <LightBulb/Learning/Evolution/EvolutionStrategy/RecombinationAlgorithm.hpp>
#include <LightBulb/Learning/Evolution/ConstantRecombinationCommand.hpp>
#include <LightBulb/Learning/Evolution/EvolutionStrategy/MutationAlgorithm.hpp>
#include <LightBulb/Learning/Evolution/ConstantMutationCommand.hpp>
#include <LightBulb/Learning/Evolution/BestSelectionCommand.hpp>
#include <LightBulb/Learning/Evolution/BestReuseSelector.hpp>
#include <LightBulb/Learning/Evolution/ConstantReuseCommand.hpp>
#include <LightBulb/Learning/Evolution/ConstantCreationCommand.hpp>
#include <LightBulb/Learning/Evolution/EvolutionLearningRule.hpp>
#include <LightBulb/Learning/Evolution/EvolutionLearningResult.hpp>
#include <LightBulb/Learning/Evolution/AbstractCoevolutionEnvironment.hpp>
#include <LightBulb/Learning/Evolution/SharedSamplingCombiningStrategy.hpp>
#include <LightBulb/Learning/Evolution/RandomHallOfFameAlgorithm.hpp>
#include <LightBulb/Learning/Evolution/CoevolutionLearningRule.hpp>
#include <LightBulb/Learning/Evolution/PerfectIndividualFoundCondition.hpp>
#include <LightBulb/Learning/Evolution/SharedCoevolutionFitnessFunction.hpp>
#include <LightBulb/Learning/Evolution/RandomSelector.hpp>
#include <LightBulb/Logging/ConsoleLogger.hpp>
#include "PongGame.hpp"
using namespace LightBulb;
class Pong;
class PongAI : public LightBulb::AbstractDefaultIndividual
{
protected:
Pong* currentGame;
void getNNInput(std::vector<double>& input) override;
void interpretNNOutput(std::vector<double>& output) override;
public:
PongAI(Pong& pong_);
void setPong(Pong& pong);
};
class Pong : public LightBulb::AbstractCoevolutionEnvironment
{
private:
PongGame game;
protected:
LightBulb::AbstractIndividual* createNewIndividual() override;
void resetEnvironment() override;
int simulateGame(PongAI& ai1, PongAI& ai2);
void setRandomGenerator(AbstractRandomGenerator& randomGenerator_);
int doCompare(LightBulb::AbstractIndividual& obj1, LightBulb::AbstractIndividual& obj2, int round) override;
public:
Pong(bool isParasiteEnvironment, LightBulb::AbstractCombiningStrategy* combiningStrategy_, LightBulb::AbstractCoevolutionFitnessFunction* fitnessFunction_, LightBulb::AbstractHallOfFameAlgorithm* hallOfFameToAddAlgorithm_ = nullptr, LightBulb::AbstractHallOfFameAlgorithm* hallOfFameToChallengeAlgorithm_ = nullptr);
void getNNInput(std::vector<double>& sight);
int getRoundCount() const override;
PongGame& getGame();
};
PongGame& Pong::getGame()
{
return game;
}
PongAI::PongAI(Pong& pong_)
: AbstractDefaultIndividual(pong_)
{
currentGame = &pong_;
FeedForwardNetworkTopologyOptions options;
options.enableShortcuts = true;
options.neuronsPerLayerCount.push_back(6);
options.neuronsPerLayerCount.push_back(10);
options.neuronsPerLayerCount.push_back(2);
options.descriptionFactory = new SameNeuronDescriptionFactory(new NeuronDescription(new WeightedSumFunction(), new BinaryFunction()));
buildNeuralNetwork(options);
}
void PongAI::setPong(Pong& pong)
{
currentGame = &pong;
}
void PongAI::getNNInput(std::vector<double>& input)
{
currentGame->getNNInput(input);
}
void PongAI::interpretNNOutput(std::vector<double>& output)
{
if (output[0] > 0.5)
currentGame->getGame().movePaddle(1);
else if (output[1] > 0.5)
currentGame->getGame().movePaddle(-1);
}
Pong::Pong(bool isParasiteEnvironment_, AbstractCombiningStrategy* combiningStrategy_, AbstractCoevolutionFitnessFunction* fitnessFunction_, AbstractHallOfFameAlgorithm* hallOfFameToAddAlgorithm_, AbstractHallOfFameAlgorithm* hallOfFameToChallengeAlgorithm_)
: AbstractCoevolutionEnvironment(isParasiteEnvironment_, combiningStrategy_, fitnessFunction_, hallOfFameToAddAlgorithm_, hallOfFameToChallengeAlgorithm_)
{
}
AbstractIndividual* Pong::createNewIndividual()
{
return new PongAI(*this);
}
void Pong::resetEnvironment()
{
game.reset();
}
void Pong::getNNInput(std::vector<double>& input)
{
input.resize(6);
input[0] = game.getPlayer() * game.getState().ballPosX / game.getProperties().width;
input[1] = game.getState().ballPosY / game.getProperties().height;
input[2] = game.getPlayer() * game.getState().ballVelX / game.getProperties().maxBallSpeed;
input[3] = game.getState().ballVelY / game.getProperties().maxBallSpeed;
if (game.getPlayer() == 1)
{
input[4] = game.getState().paddle1Pos / (game.getProperties().height - game.getProperties().paddleHeight);
input[5] = game.getState().paddle2Pos / (game.getProperties().height - game.getProperties().paddleHeight);
}
else
{
input[5] = game.getState().paddle1Pos / (game.getProperties().height - game.getProperties().paddleHeight);
input[4] = game.getState().paddle2Pos / (game.getProperties().height - game.getProperties().paddleHeight);
}
}
int Pong::getRoundCount() const
{
return 1;
}
int Pong::doCompare(AbstractIndividual& individual1, AbstractIndividual& individual2, int round)
{
return simulateGame(static_cast<PongAI&>(individual1), static_cast<PongAI&>(individual2));
}
int Pong::simulateGame(PongAI& ai1, PongAI& ai2)
{
ai2.resetNN();
ai1.resetNN();
ai1.setPong(*this);
ai2.setPong(*this);
resetEnvironment();
double time = 0;
while (game.whoHasWon() == 0 && time < game.getProperties().maxTime)
{
game.setPlayer(1);
ai1.doNNCalculation();
game.setPlayer(-1);
ai2.doNNCalculation();
game.advanceBall(1);
time++;
}
if (game.whoHasWon() == 0) {
if (parasiteEnvironment)
return -1;
else
return 1;
}
else
return game.whoHasWon();
}
void Pong::setRandomGenerator(AbstractRandomGenerator& randomGenerator_)
{
AbstractRandomGeneratorUser::setRandomGenerator(randomGenerator_);
game.setRandomGenerator(randomGenerator_);
}
int main()
{
SharedSamplingCombiningStrategy* cs1 = new SharedSamplingCombiningStrategy(25);
SharedSamplingCombiningStrategy* cs2 = new SharedSamplingCombiningStrategy(25);
RandomHallOfFameAlgorithm* hof1 = new RandomHallOfFameAlgorithm(5);
RandomHallOfFameAlgorithm* hof2 = new RandomHallOfFameAlgorithm(5);
Pong environment(false, cs1, new SharedCoevolutionFitnessFunction(), hof1, hof2);
Pong parasiteEnvironment(true, cs2, new SharedCoevolutionFitnessFunction(), hof2, hof1);
cs1->setSecondEnvironment(parasiteEnvironment);
cs2->setSecondEnvironment(environment);
EvolutionLearningRuleOptions options;
options.creationCommands.push_back(new ConstantCreationCommand(250));
options.exitConditions.push_back(new PerfectIndividualFoundCondition(20));
options.reuseCommands.push_back(new ConstantReuseCommand(new BestReuseSelector(), 16));
options.selectionCommands.push_back(new BestSelectionCommand(150));
options.mutationsCommands.push_back(new ConstantMutationCommand(new MutationAlgorithm(1.6), new RandomSelector(new RankBasedRandomFunction()), 1.8));
options.recombinationCommands.push_back(new ConstantRecombinationCommand(new RecombinationAlgorithm(), new RandomSelector(new RankBasedRandomFunction()), 0.3));
options.environment = &environment;
EvolutionLearningRule learningRule1(options);
options.environment = ¶siteEnvironment;
EvolutionLearningRule learningRule2(options);
CoevolutionLearningRuleOptions coevolutionLearningRuleOptions;
coevolutionLearningRuleOptions.learningRule1 = &learningRule1;
coevolutionLearningRuleOptions.learningRule2 = &learningRule2;
coevolutionLearningRuleOptions.logger = new ConsoleLogger(LL_LOW);
CoevolutionLearningRule learningRule(coevolutionLearningRuleOptions);
std::unique_ptr<EvolutionLearningResult> result(static_cast<EvolutionLearningResult*>(learningRule.start()));
PongAI* bestAI = static_cast<PongAI*>(result->bestIndividuals[0].get());
getchar();
return 0;
}