Making a neural network that can evolve
Made by: Chris Mallet
Machine learning is a very interesting tool. It can be used in game development to speed up time intensive subjects. For instance, if you want to test a game you need a lot of testers, and they need to take their time. With machine learning we can speed up the process by using a neural network that can learn to play a game from its mistake. So how do we make a neural network that can evolve without human intervention?
Table of contents:
- 1. Machine learning
- 1.1 Supervised Learning
- 1.2 Unsupervised Learning
- 1.3 Reinforcement Learning
- 2. Neural Network
- 2.1 Input Layer
- 2.2 Hidden Layer
- 2.3 Output Layer
- 3. Neat algorithm
- 3.1 Genome
- 3.2 Mutation
- 3.3 Crossover
- 3.4 Speciation
- 4. End result
1. Machine learning
Before we can look into the neural network we need to understand what machine learning is because we will be using machine learning for the evolution. Machine learning is a form off artificial intelligence that can learn how to perform certain tasks by assessing data given to the algorithm. This data will be inspected, and relevant patterns and associations can be surmised by the algorithm. What kind of data is used and what kind of associations can be made differs per implementation. There are three main kinds of machine learning implementations as of now, namely: supervised, unsupervised and reinforcement learning. All these implementations have different uses so let’s look at them individually.
1.1 Supervised Learning
Supervised learning uses labeled data to train itself to predict outcomes or organize data by relevant categories. The input and output of this data is already known from the beginning which helps the algorithm assess its own accuracy. This in turn leads to the algorithm being able to predict outcomes off new data more accurately and learn over time from its mistakes. Because of this, a supervised learning algorithm has a high chance of finding the correct solution.
But because we need labeled data to even start training the algorithm, we will need to focus more time on gathering and labeling the data correctly. This algorithm only works as well as the data gathered so this requires quite a lot of human intervention to make.
1.2 Unsupervised Learning
Unsupervised learning is the opposite of supervised learning, meaning that we use unlabeled data to gain the outcome desired. This algorithm will search for the patterns within this unlabeled data and make associations or cluster it together for things like market segmentation. The assumptions will be made almost entirely on its own. The only place where it is common to have human intervention is at the output stage.
Because this algorithm uses unlabeled data and makes its own conclusions, it is less complex and easier to build. However, because the data is unlabeled the algorithm has to make sense of the data before it can give predictions. This leads to more points of failure making it less accurate than a supervised algorithm. Therefore, there is a greater need to check the output of this algorithm requiring more human intervention in later stages than was necessary for a supervised learning algorithm.
1.3 Reinforcement Learning
Reinforcement learning uses a different way to reach an output and learn from its mistakes than the two earlier methods of machine learning. Instead of only learning from the generated output of one step, it looks for improvements after a fixed point in time by receiving feedback and calculating if the choices it made helped reach its goal. This is done as seen on the picture below.
The agent receives the decision made by the algorithm and performs an action accordingly. The environment will respond with a reward as feedback to the used action and the current state of the agent is send back. In turn the algorithm will read the feedback and see if their choice has increased or decreased its progress towards the goal. This loop is repeatedly executed until the goal has been reached. Depending on the complexity of the system this can take tens of thousands of tries or even more.
One of the downsides as you can probably guess is the time to train this algorithm. Depending on the situation it can take weeks to train this algorithm correctly while the other learning methods could quickly run through their starting data and then begin functioning.
Another problem is that this algorithm can learn correct outputs to use in one situation, but a slight variation on this situation will throw the algorithm for a loop. For example, an algorithm has learned to jump over an obstacle that would kill it. The next obstacle it meets it will use the exact same movement to avoid this obstacle. But if there would be a hole in the place it would land then it has to relearn how to avoid an obstacle.
For our simulated player we are going to use a reinforcement learning algorithm as this algorithm can learn by playing a game instead of learning by reviewing already processed data.
2. Neural Network
Now that we know how we can make something evolve with machine learning we are going to look at the functionality of the test player itself. A test player needs to be able to make decisions all by itself to be able to improve and evolve. For this we are going to use a neural network. A neural network is a computing system that is inspired by the way a brain works. It exists out of neurons which are all linked in a specific order depending on the layer the nodes are placed. A model of a neural network can be seen in the picture below.
A neural network consists of 3 layers: The input, hidden and the output layer. All 3 of these layers consist of neurons which have their specific purpose. Let’s take a look at all of the layers individually.
2.1 Input Layer
This is the starting layer of the neural network. In this layer we give the necessary data to the neural network so it can start processing the data. Each neuron in this layer will take a specific part of the data and feed this forward to the hidden layer based on its connections to said layer. What the connections do will be described in chapter 2.2, but what you need to know for now is that each connection will have a specific importance that the neural network itself can decide. Based on the data input the connections actual weight will be calculated.
Now let’s look at the input used in our test case as seen in the figure above. We have a world that is a grid of squares. Each square within this grid is made of 16×16 pixels. The algorithm needs to know its surroundings to be able to decide on an action, so we want our input to be the world around the player. We have two ways we can look at the world around us. We can use pixel recognition to read precisely what each pixel in the world looks like, or we can use a grid system that reads what blocks are currently inside that grid system.
If we were to use pixel recognition we would get a more precise view of the world, but our neural networks input layer would become massive. If we take a standard screen resolution of 1920 by 1080 this would have a total of 1920×1080=2073600 pixels. We would only need to know what is in a small part of the screen but even with only one tenth of the screen used it would still have 207360 pixels. To use an input layer with the same number of inputs would be extremely computationally heavy and have way too many possible outcomes later in the neural network. This large of a neural network would have a low chance of finding correct solutions to our problem.
If we were to use the grid system we would get a less accurate view of the world, but the number of inputs become far fewer. As we know the blocks in the grid are made of 16×16 pixels which are a total of 256 pixels. This means that the grid system would have 256 times less inputs to read than the pixel recognition system. For this reason we are going to implement a grid system.
First, we need to decide how big our grid system will be. For a side scrolling platformer, it is most important to see what is ahead of you and less important to see what is behind you. You mostly keep running and sometimes need to stop and wait. Backtracking is usually not necessary. Because of this we are going to start the grid at the player object. From there we will give it a 5×6 grid and we will move it down one tile so the ground can be seen. This will total to a grid of 30 values which all get a single input neuron linked to them. The result can be seen in the picture below. Each input will have the possible value of 0 if there is no block and 1 if there is a block. This is to help with the activation function in the hidden layers which will be explained in the next chapter.
2.2 Hidden Layer
Now that we have our inputs, we need to transform their value to functional output data. This is done in the hidden layer. In this layer we can have multiple separate layers of neurons and a different number of neurons per layer. This all depends on what functionality the neural network is supposed to be able to complete. In a normal neural network, we would decide how many layers are within a hidden layer and how many neurons are in each of those layers. However, this is not the case in our project. Because we are going to make this network evolve, we can only set how many layers are within the hidden layer. Why this is the case will be explained in chapter 3 about the actual evolution of the neural network. All you need to know for now is that the neural network will evolve by itself and add neurons to these layers.
How do you decide how many layers you need in a hidden layer? First we need to look at what our neural network is going to be using as data. As discussed in the input layer, we are going to be using a simple data structure that gives a 0 or a 1 to the input neurons. According to Jeff Heaton: “Two or fewer layers will often suffice with simple data sets” (Heaton, 2017). Jeff goes on to explain that a complex data set would be something like pixel recognition or something involving time-series. Luckily, we avoided using pixel recognition and used a simple block data set so we can use a network with two or fewer hidden layers in our project. Our neural network will look somewhat like the picture below. It will use two hidden layers to keep the possibility of more complex neural networks while not overfitting.
This structure will be filled by the evolutionary algorithm that will be explained in chapter 3, but lets follow a classic example to see how a hidden layer could work.
We want to be able to recognize numbers from a picture. The input of this system would be every pixel on the screen. This input would then go to the first hidden layer. In this layer, the neurons will make connections based on if the combination of input neurons would form a small edge. This information would then be sent to the second hidden layer, where it would look for a combination of previous neurons that would make a curve or a long line. Afterwards a new hidden layer would look for parts of a number like a loop and a line to make a 9. If this is found, the correct output neuron will be chosen as the output of the system. But how does a neural network decide what information corresponds to a neuron? For this, we need to look at every connection between the neurons called an edge.
Every edge has a weight that is set by the neural network. The minimum and maximum weight per edge can be personally decided upon so to keep it simple we use a minimum of -1 and maximum of 1. The input neurons will be activated by the starting data and every activated neuron will send a weight to the hidden layer by their edges. Every neuron in the hidden layer will receive all the weights in those edges. These weights will be added together to make one big weight number for that neuron. Every neuron also has its own bias which will be added to this number. After this number is calculated, the number will be ran through the hyperbolic tangent also know as tanh calculation to keep the number between -1 and 1. You can see the result of the tanh function in the picture below.
So, if we look back at figure 5, we have a neural network with an input layer made of 3 neurons. The first hidden node will have an activation function of N=tanh (we1+we2+we3+bias) where we= weight of a connecting edge. The neuron would activate if the number calculated in the function N is above 0. If the number is equal to or lower than 0 it will not activate and any edges it has to further neurons will also equal 0 as they are not activated for further calculations. This activation function is necessary to introduce non-linearity to the output of a neuron. We want the output of a neuron to not change an equal amount to a change in the input. This would otherwise limit our neural network in decision making to only find simple answers. If a complex answer is expected non-linearity is required.
2.3 Output Layer
With the outcome of the hidden layer, we can produce workable output in the output layer. In this layer we have neurons that specify what kind of output can be chosen and what effect this will have on the object it controls. In our situation a player object can move left, right and jump so we have three output neurons. To see what will activate we will count all the edges to an output neuron from the last hidden layer. As explained, these edges will have their own respective weight value. We use the same function N= tanh(we1+….+weX) that was used in the hidden nodes. The only differences is that we don’t add a bias during this calculation as all outcomes are as important and likely to happen. This calculation will then result in the outcome variable of the neuron. If this value is above 0 then the neural network will activate the output neuron.
For example, if the value for the “Move Left” neuron is higher than 0, the player character on the screen will move to the left. If it is below 0 then the neural network will not activate the output neuron, and nothing will happen on the screen. While the input of the neural network doesn’t change, the output will also stay the same. In other words, if the neural network has decided to move to the left because it notices that there are only ground blocks below it and this stays the same the entire time, it will continue to move to the left forever. In the video below you can see how objects are moved by a randomly initialized neural network. Their neural network only has edges going from input neurons to output neurons with weights and no hidden neurons. As you can see it moves in random directions without a goal.
3. Neat Algorithm
Now that we have the blueprint for a neural network, we need to know how to evolve this network to fit our problem. To do this we are going to use a genetic algorithm called Neuroevolution of augmenting topologies, henceforth called Neat. This algorithm is specifically designed to improve a neural network through simulating the evolutionary system of natural selection.
Before we can start evolving the neural network, we need to concretely write down what the neural network is made of. In biological terms this is called the genotype or genome off the structure. This represents the genetic makeup off our neural network. But how do we represent a neural networks genetic makeup? There are two main ways this can be done, namely: direct or indirect encoding.
Direct encoding will specify every detail about an individual. This means that every node, connection and property of a neural network is directly linked in the gene. This can be done in a multitude of ways like encoding it to a graph with weighted connections or using a binary system. The result of this system is that we can modify every single detail of our neural network.
Indirect encoding is the exact opposite of direct encoding. In this structure we set rules and parameters for creating an individual, but we won’t declare every detail ourselves. This is a more compact solution as we need to know less details about a system. However, this can also result in a bias towards a specific solution if we don’t know precisely how this encoding system will be used.
Neat chooses to use a direct encoding methodology to have a clear insight into its genetic makeup. An example of a genome can be seen in the figure below. Here we can see that the neuron and edges, in this case called connections, are both saved in different gene lists. Each neuron has its functionality registered as the input neurons, in this case sensor neurons, and output neurons will not mutate during the evolution process. For the connections we save every useful piece of information. This includes the following: its start and end neurons, the weight of the connection that is used to calculate the eventual output as discussed in chapter 2, if it is enabled or disabled in the structure itself and the innovation number which is used to identify each connection. A new genome will start with input and output neurons, no hidden neurons and no connecting edges. These will all be made through the evolution process.
Now that we know how our system keeps its data, let’s start the evolution process. Because we begin with a network with only input and output neurons, nothing will happen. To change this, we need to add edges and hidden neurons to the neural network. The neat algorithm does this by adding a mutation chance that can add edges and neurons to the genome after every run of the program. This mutation has a random chance of occurring in each genome, which can be changed to the liking of the creator. There are 4 main mutations that can occur, namely: A weight of an edge is changed, a new edge is created, a new hidden neuron is created with its own edges or the state of an edge is changed. Let’s dissect each of these mutations.
The first mutation is easy to understand. Each edge has its own weight as discussed earlier, and this mutation changes the weight of a randomly selected edge. This can lead to a change of activated neurons and in turn that will lead to a different output then before.
When a new edge is created, this edge will need to receive a lot of information. First, it needs to know the start and end neurons which could be any of the three neuron types. With this it will know where the weight needs to be added for all the neuron calculations (see figure 8, mutate add connection). Second, the edge will need to gain a weight that will be associated to this edge. Like the first step, this is needed for the activation function calculation of the neurons. Third, it needs to know if the edge is enabled in the hierarchy. This is different than the activation of an edge by the neuron activation as a disabled edge is always disabled, even when the neuron is activated. And last, a newly generated innovation number if the mutation has not happened before. This innovation number is used to keep track of every new edge that is created. This is useful in the crossover stage which will be discussed in the next sub-chapter.
The mutation of a new neuron adds a new hidden neuron to the neural network. This new hidden neuron will always be generated on a currently existing edge. The edge will be split up and make a connection between the original start neuron to the hidden neuron, and another edge from the hidden neuron to the original end neuron (see figure 8, mutate add node). The first edge will keep the original weight of the edge and the second edge will keep a weight of 1. This is to help mitigate issues with the new additions. By implementing the mutation like this we don’t actually change the output of the neural network, but we do change the structure.
The last mutation is the simplest out of all four. An edges state can be mutated randomly from disabled to enabled and reverse. This gives the possibility to activate edges that might be benificial and disable those that aren’t.
With the possibility to mutate new neural networks we need to be able to inherit these changes in new generations of the algorithm. The Neat algorithm does this by using crossover between two different genomes like how a child would inherit genes from both parents. This crossover is done after every itteration of the algorithm. This is also where the innovation numbers come to play. Each neuron and edge have their respective innovation number. If the information in the edge of an innovation number is the same in both parents, the child will simply inherit the same information off a randomly chosen parent.
If the genes of the parents have differences these will both be inherited by the child. If we look at figure 9, we see that there are quite a few differences between parent1 and parent2. Parent2 has an extra hidden neuron 6 and some extra edges that parent1 doesn’t have. This edges are called the disjoint edges as only one parent has them. Because parent1 doesn’t have a 6th neuron itself, we can inherit the neuron from parent2 and change the necessary edges with the innovation number 5, 6 and 7. Another edge that was added in parent1 is an edge from neuron 1 to neuron 5. As we have both these neurons in both parents, we can simply add this edge to the list without much trouble. At last, we have 2 edges which are of a higher innovation number than the last edge of the other parent. These are called the excess edges. In this instance, the excess edges can be added to the child as it has already gained the new hidden neuron used in edge number 10. And as you can see, we now have an offspring that has gained the changes in structure from both parents.
To keep diversity between the neural networks alive we need to make sure that not all mutations will be transferred to every neural network. This is done through dividing all neural networks in groups of species. These groups are made of neural networks that are like each other in structure. The Neat algorithm has a distance (D) formula to determine if two genomes are like each other, as seen in figure 10.
Ne stands for the number of excess edges and Nd stands for the number of disjoint edges. The W stands for the sum of the weights of all the edges within a gene. The c variables are coefficients that we can change according to what we find most important in this formula. For example, if we don’t care about the total weight score, we can simply set the coefficient to 0 and it wouldn’t count towards the distance score.
With a healthy diversity in the genome pool, we need to start killing off a percentage off the population so we can create offspring. To do this we need to write a fitness formula. This is a metric which shows how well a genome did the task it is asked to perform. How a fitness formula is written depends on the project so let’s look at some important points for a side scrolling platformer. In this setting the player needs to be able to move towards the end point so we can look at the distance travelled. If there is a score, we may want a higher score to be rewarded. And we can also look at rewarding the algorithm for the time in which it would complete a level. This fitness formula could look something like this. Fitness= (distance*c1+points*c2)/time*c3 where the c variables are coefficients that we can change, and time is in seconds. In this fitness formula we prioritize time to be the most important part as the score will be lowered by however long it takes to finish a level. This can be altered by changing the values of the coefficients.
Now that we have a metric on how well a gene scored, we can start removing genes from the pool. The neat algorithm chooses to remove a certain percentage of the lowest scoring genes. This percentage of genes would be removed from all existing species. This can mean that 20% of low scoring genes would be removed, while a species has more than 20% of low scoring individuals. The neat algorithm doesn’t see this as a bad thing as most random mutations will start as a bad influence on a neural network. Only through multiple changes and offspring will it eventually have the possibility to give beneficial effects. This is why we want to keep a healthy diversity in the genome pool. To keep the possibilities of initially low scoring mutations transferring over to others and possibly improving those.
4. End result
Now you may have noticed that I have only mentioned my own prototype a few times and focused mostly on the actual theory behind everything. This is because my own prototype has implemented a lot of things very poorly and without a complete understanding of the subject. In my excitement to start working on the project, I hurried the research phase along to the point that I didn’t fully understand what I was implementing while I thought I did. I had some basic understanding of machine learning and neural networks before starting this project and I overestimated how much I would be able to do with this knowledge. This led me to start my prototype way to early.
For example, instead of implementing a neural network with a set number of hidden layers I decided to make a neural network that could be expanded indefinitely. This led to the Neat algorithm not learning anything useful as it would just expand in size making way to complex neural networks and eventually causing errors which broke the neural networks. And besides that, the neural networks did not have any neuron activation functions and would just add every weight of an edge if the input neuron was on. Since my neural network implementation itself was already wrong, I was never able to test if my neat implementation worked correctly.
As you can see there are a lot of things wrong with my prototype. If I had more time, I would have liked to change these aspects and maybe I will fix these problems in the future. The only thing I can say I’m proud of is my implementation of observing the surroundings as described in chapter 2.1. Cutting the world up into blocks and using that to view the surroundings cut down the processing costs by a lot and it worked very well as an input for a neural network.
If I could impart you with one piece of knowledge than it would be this. Do NOT hastily do you research! If I had waited and done more research, I would have probably implemented most if not all of what I wrote down in this blog. So don’t make the same mistake I made and do your research.
Brownlee, J. (2021, January 18). How to Choose an Activation Function for Deep Learning. Retrieved from Machine Learning Mastery: https://machinelearningmastery.com/choose-an-activation-function-for-deep-learning/
Dabbura, I. (2018, April 1). Coding Neural Network — Forward Propagation and Backpropagtion. Retrieved from towards data science: https://towardsdatascience.com/coding-neural-network-forward-propagation-and-backpropagtion-ccf8cf369f76
Green, C. D. (2009, September). Speciation in Canonical NEAT. Retrieved from sharpneat.sourceforge: https://sharpneat.sourceforge.io/research/speciation-canonical-neat.html
Heaton, J. (2017, January 06). The Number of Hidden Layers. Retrieved from Heaton Research: https://www.heatonresearch.com/2017/06/01/hidden-layers.html
Heidenreich, H. (2019, January 4). NEAT: An Awesome Approach to NeuroEvolution. Retrieved from Towardsdatascience: https://towardsdatascience.com/neat-an-awesome-approach-to-neuroevolution-3eca5cc7930f
MacWha, R. (2021, March 4). Evolving AIs using a NEAT algorithm. Retrieved from machwa.medium: https://macwha.medium.com/evolving-ais-using-a-neat-algorithm-2d154c623828
Malato, G. (2021, May 10). How Many neurons for a neural network? Retrieved from Yourdatateacher: https://www.yourdatateacher.com/2021/05/10/how-many-neurons-for-a-neural-network/
Software Testing Help. (2021, November 1). Types Of Machine Learning: Supervised Vs Unsupervised Learning. Retrieved from Softwaretestinghelp: https://www.softwaretestinghelp.com/types-of-machine-learning-supervised-unsupervised/