Dynamic Difficulties in a PCG Game – DynaGen

By Alexander Franke, for AUAS HBO-ICT Game Development.
StudentNo. 500792174

E-Mail: alex.franke@hva.nl

Introduction

Gaming itself is a language to be learned by those that play them, as evident by the play styles of those that don’t. Learning the basic skills required to navigate, complete, and ultimately enjoy a video game can be a long process, one which a tutorial or introductory level often does not encompass. It might take several games of experience before getting comfortable with the controls alone. Most importantly, a game that scales in difficulty, while often being very good at scaling up, does not like to scale down at the risk of becoming too easy. Many games therefor do not implement dynamic difficulties, often opting for a difficulty setting the player can choose themselves, which begs the question why games can’t make this decision for the player simply by observing them?

In this blog post, I will highlight the implementation of a dynamic difficulty setting in a Procedural Generated Dungeon Explorer, and the challenges that it introduces. We’ll discuss what data a game can collect from a gamer, and what parameters it can change using this data, and in turn, how these parameters influence the experience of the player. The ultimate goal is to construct a game that automatically adjusts to the optimal Skill-Challenge ratio as detailed in Mihaly Csikszentmihalyi’s Flow theory (reference graph below).

Mihaly Csikszentmihalyi Flow Theory. Graph by David Maletz.

Table of Contents

Introduction
Table of Contents
0. Setup & Requirements
1. Building the PCG Dungeon
1.2 Results
2. Building the Difficulty Algorithm
2.1 The concept
2.2 Opening the Black Box
2.3 Building the Algorithm
3. Building the Tests
3.1 Blinded Experiment
3.2 Setup Nott, Setup Molly
3.3 Testing Numbers and Parameters
3.4 The questionaire
4. Results & Conclusions
4.1 Conclusions – Question 6 & 7
4.2 Conclusions – Questions 2 to 5
4.3 Conclusions – Research questions
5. Future
6. Sources

0. Setup & Requirements

At heart, I’m a manager and designer of games rather than a programmer, so setup is very important to me if I want to keep myself on my feet during development. Though the project would undergo three very different phases, all of them would be intermingled in some way, designing one whilst building the others. Hindsight comes in handy here, since I can tell you that during all three stages of construction (the PCG Dungeon, the Algorithm, and the Tests), changes were made to the others, though some larger than others. Scheduling also proved vital in staying on target, and maintaining focus, though I did lose myself in this process several times. It is important to mention in this setup that most feedback for this project came from the guilds that were created. A vital piece of information gained from these guilds was the advice to not try and do as much as possible right away, which can be summarized in one word with “Scope”; a PCG Dungeon can be a project in and of itself, so make this part as short as possible. Get what you need, and continue, do not linger. The last piece of information I needed before starting out, was my own skill level; at this point I hadn’t opened an IDE in 1,5 years, and was not looking forward to the re-learning process. This was also brought up with my Guild, who gave me appropriate advice. With this information in my hands, I was able to begin my project.

1. Building the PCG Dungeon

Considering the information mentioned in ‘0. Setup’, I gave myself two options; either find an already built Procedural Generated Dungeon project or system for Unity (on the asset store for example), or follow along with an online tutorial. Building one myself was out of the question entirely. I chose the latter, because it would both reintroduce me to Unity more efficiently than the former, and it would give me a better understanding of the parameters of the dungeon, and how to change these within an algorithm. The tutorial used would be Sunny Valley Studio’s ‘Procedural 2d Dungeon Tutrial’ series (link in sources). The process of following along with the tutorial will not be highlighted in this blog post, as these are not of interest to the end result. The results of this decision will be discussed in ‘4. Results’.

2. Building the Difficulty Algorithm

The real fun stuff of this project was based around the principle of an evolving game. What this means can be divided into two concepts, one high level, and one low level.

2.1 The concept

The High level concept of this dynamic difficulty is to build a game that adapts to the player. On a grand scale, you’d alter difficulty, customize rewards, change progressions, or perhaps even shape the story based on how a player plays. You might want to encourage engagement with other aspects of the game if the player focuses too much on one gameplay mechanic that might ruin their experience with other. This is different from scaling, since you’re changing the world around the player to suit the player’s style, instead of just influencing how hard something is by buffing or de-buffing hit points.

That’s very nice and all, but what does that mean for our project? The Low level concept is that we gather data from the player, deduce their playstyle from this data, and influence the game generation algorithm using this deduced data. See the flowchart below for a quick overview of this concept.

I am of the opinion that every Video Game model needs a cool name. Hence DynaGen,

We can clearly see three distinct stages each feeding into the next; Player Data, Playstyle Deduction, and a Generation Algorithm. For every aspect of our game, we need to figure out how to construct these three stages, as one will influence the effects or results of the other two, which in turn will influence the core Gameplay, which will feed back into itself.
Later in this blog post, we’ll see how our own game works in this model, including the formulas used.

In later stages of research, I discovered something called the “Learning Loop”. Though it might not be as broadly used, it bears significant similarity to our DynaGen model.

Source: business.edx.org

This got me thinking about why we wouldn’t want a model like DynaGen, and though at this stage we aren’t meant to answer questions, I explain that DynaGen explicitily takes the Learning Loop out of video games, and does it for the player. In short; instead of having the player learn the game’s mechanics, the game learn’s the player’s mechanics. This is at the core of how DynaGen works.

2.2 Opening the Black Box

Keeping track of the algorithm was key to figuring out how it worked, because you are essentially creating this black box of maths that works on its own, not really knowing what it does. I took direct inspiration from Minecraft for this, and made myself a lovely Debug menu that would give me all the information I needed for every level (see image below – the image also includes a sneak-peak at the end result).

Did building this little Debug menu help me out? Absolutely. You really can’t do without if you are testing an algorithm and want to know what elements of it work and what doesn’t. More about testing the algorithm later.

2.3 Building the Algorithm

There’s nothing quite like trial-and-error, and that is how building an algorithm went for me personally. The cycle of Trial-and-Error, in case you’re not familiar, usually begins with an idea, implementing this idea, and comparing it to the last. Did something improve? Iterate on what you just added. Did it worsen the result? Perhaps take a step back. This is how I constructed the three algorithms relevant in DynaGen.

The first algorithm, Player Data, can be as complicated as you want it to be, but as a proof of concept (and a limit time available for development), I stuck to two variables that I would collect; the time a player takes to complete a level, and the score difference made during a level. A minimum of at least two variables is recommended in the case of DynaGen, as to not punish the player for an unfair generation, but also not reward them for rushing through the game. For the Player Data stage, the following is important; the more Data you collect, the fairer DynaGen will function. An exception to this rule is the length of data you store, or how much of each variable to feed into the Deduction stage; you want to create an average that isn’t too long or too short. You could generate another algorithm that dynamically adjusts this average, but this is beyond the scope of this project.

The second algorithm, Playstyle Deduction, has a very carefully chosen name. Though we will use DynaGen to scale difficulty, we will also use it to scale level generation and even camera position. This algorithm combines the Player Data fed into it, and does something with it. In our case, it will create an average of all data, and this, to me, seems to be the most logical use for the Playstyle Deduction stage. I must however stress that you could combine different kinds of data as well, for example the amount of times the player looted chests, and the amount of rewards they gained.

The prevTimePerf and prevScorePerf represent the Player Data, and the return command represents a part of the Playstyle Deduction where I simply combine both variables into a single performance factor (this is then stored in a fixed length array). Notice the result of Trial-and-Error here as well; the entire return formula was constructed step-by-step, including the seemingly arbitrary ‘105’ used.

The second part of the Playstyle Deduction comes up later in to code, when we use a short for-loop to sum up the total of the fixed length array storing this data. The final stage of Playstyle deduction is a matter of how powerful the Playstyle Deduction factor will function. We want a snappy amount of change that changes gameplay over a span of seconds. A longer, drawn out experience would not multiply this factor as heavy a I have done (in our case, factorPower is set at 7 – this would mean that a performance level of 0.95 becomes ~0.70! Quite steep indeed).

The third and final stage, Generation Algorithm, is where we use our processed data (in our case just one variable) to determine what happens when the game next enacts a stage of generation. You can change as many parameters in your game as you want, from something simple like enemy HP, to what kind of terrain generation works best for your player. The complex things often require more effort though, which could sometimes lead to entire coding projects that will only get used by a small amount of players, especially if you consider the example I just gave about different world generation techniques. You could image the difference in experience one player experiences from the next one as well. In short; you have to find the sweet spot for your game. Too little, and you will only be scaling HP, which isn’t as dynamic. Too much, and you’ll be creating an incoherent experience. This last part especially is a good one to remember; this is a dangerous technique to be implementing when you gravitate towards the extremes of this system. Crafting a game that is made for everyone is impossible (with beauty in the eye of the beholder and all that), and trying to achieve it by making a video game as dynamic and adaptable as possible would be equally impossible in ways that this blog post will not list.

Below is the model our game will use, and below that is an example case of Stage 3 in DynaGen.

A quick walk-through of the code below; once the Case has been triggered, we check if the case is even able to change. If that is the case, wasAbleToChange will remain false, and a new case in the switch will be triggered (this system is not included in the screenshot). The change is then applied to a new variable (in this case, newCoinCollectedReward), before being casted to the proper variable and type of the relevant parameter.

If all of these stages work out as intended, the game is allowed to continue on its merry way in regenerating, and we have an evolved level!

3. Building the Tests

If you’re ever looking for advice on how to write up a User Test, get yourself to a good cafe where they have good coffee, because it worked for me, yet I don’t know how. I wish I was able to better explain my process, but I suppose this must be the ‘Unconsious Competence’ that this Maslow-fellow talked about a while back (not to pat my own back here – I honestly don’t recall using a single moment of logical thinking while writing the experiment). One thing that has stuck with me through all these years of writing research papers, is setting up a goal beforehand; what to we want answered. Here’s what I came up with;

Research Goal: What would a dynamic difficulty / gameplay algorithm look like in a PCG Game?

Side Goal 1: Does a dynamic algorithm in a PCG game balance itself out over time, regardless of the person playing?

Side Goal 2: Would a dynamic generation algorithm work in other game genres that do not use PCG?

Given that the main research goal had already been answered (TLDR; it’s DynaGen – more on this in Part 4 of this blog), some Side goals were needed, that reinforced the answer of the main research goal. Hence Side goals 1 and 2, which both require the user testing to be answered.

3.1 Blinded Experiment

Before even getting myself to aforementioned cafe, however, I had a discussion on how to set up my research with several other students, as well as some lecturers, the latter of which noticed I was looking for a specific model of testing I was unaware of; a Blinded Experiment (or Double Blind Test). This is a situation in which “information which may influence the participants of the experiment is withheld until after the experiment is complete.” (Wikipedia). This meant less work for me in setting up the tests, but I did give myself more work by figuring out how important the same questions was to different ‘setups’ as I called them. Let’s look at these testing setups!

3.2 Setup Nott, Setup Molly

Remember, cool project names with references score bonus points (right?)

The testing document I made up is extensive, and often repeats information for the sake of my own clarity, so I will digest all the information down into a simple step-by-step walkthrough of a User Test.

First, a random person will be approached with a simple question; ‘do you want to play my game for a couple of minutes?’ They will only get told the following;
1. The game has procedurally generated levels that change over time depending on your playstyle / how you play.
2. Please play the game for 25 runs.
3. Are you alright with answering 7 questions afterwards.
Here is where the Blinded part of the tests come into play; Question 1 will be a lie to 33% of participants. 2/3rds of the testers will get the version that we know and love, while 1/3rd will play a version of the game that starts out the same, but doesn’t change over time! The differences in answers would lay bare the value of a DynaGen in our PCG Dungeon.

Second, the user will get a setup chosen by me, derived from the current number of testers for each stage, staying as close as possible to the 33/66 divide.
Setup Nott would be the setup without DynaGen active. These people would just be playing a game with a PCG algorithm. This is the control group.
Setup Molly would be the group of testers that would experience the differences that DynaGen makes, or rather play the game as we actually programmed it.
Note; neither setups would play the game with the debug menu open (shown in Par. 2.2).

There is also a third Setup, Setup Fjord, that I would only enact once a threshold of 10 users had been exceeded. This setup would have users will play the game with the algorithm randomly applying changes instead of structurally, by essentially removing the Playstyle Deduction stage. This would be a great test for my DynaGen system, to see if this feedback system was important at all. By this stage, 20% of all testers would play Setup Fjord (this percentage would be achieved by having both Nott and Molly relinquish 10%).

3.3 Testing Numbers and Parameters

Speaking of thresholds, let’s very quickly talk numbers before we continue to step 3; I set up some values for this experiment to keep myself grounded, and to keep things organised. The numbers shown are a mix between my experience playing the game, and my experience on building research papers. Some are also completely ‘winged’ and don’t have a method attached to them.

33% of people will run setup Nott.
66% of people will run setup Molly.
Minimum amount of user tests: 10 for validity
Optimal amount of user tests: 20+ (min. amount for a percentage value)
If the minimum amount of tests is exceeded, both setups will relinquish 10% of their user base to Setup Fjord.

(Note: All these so far numbers are approximates and not end results)

The following data will be recorded in a Spreadsheet, separate from the Research questions; 
Rounds played (expectation: 25).*
Time taken (expectation: 4 minutes).
*
Score achieved by 25 rounds (expectation: 300 – 350).*

The parameters the game would start out on, in all setups, would be the following;
minRoomWidth/minRoomHeight = 7;
dungeonWidth/dungeonHeight = 50;
offset = 3;
Camera.main.orthographicSize = 8;
enemy.speed = 3;
enemy.range = 7.0f
player.coinCollectedReward = 15;

(Note: these are roughly the middle-values of all parameters).

3.4 The questionaire

After allowing the user to complete 25 levels in whatever way they see fit, they are given a 7 question long questionaire about their experience using Google Forms. Here are all the questions in order;

  1. On a scale from 1 to 10, how skilled of a gamer do you rate yourself? 
    (1 = Amateur, 5 = Average, 10 = Master).
  2. On a scale from 1 to 10, how easy or hard was the game after 3 levels.
    (1 = Way too easy, 5 = Just right, 10 = Way too hard).
  3. On a scale from 1 to 10, how skilled would you say you were in the game after 3 levels.
    (1 = Amateur, 5 = Average, 10 = Master).
  4. On a scale from 1 to 10, how easy or hard was the game after 25 levels.
    (1 = Way too easy, 5 = Just right, 10 = Way too hard).
  5. On a scale from 1 to 10, how skilled would you say you were in the game after 25 levels.
    (1 = Amateur, 5 = Average, 10 = Master).
  6. What changes did you notice as the game progressed? (Open question)
  7. Did you notice a pattern in these changes? (Open question, optional)

These questions were made to answer the questions of progression, noticeability of the algorithm (difference between setups), and also the player’s own progression in relation to skill; Especially questions 2 to 5 were made to fit right into the Skill-Challenge graph we showed earlier. The first question can be used later to identify any extreme scores.

And, because I love me some tables and spreadsheets, I made an easy reference beforehand on what the optimal outcome of each question would be, and the importance of the results;

QuestionSeup NottSetup MollyImportance
1N.a.N.a.Less
2Average of 5Average of 5Less
3Average of 5Average of 5Less
4Average of 2Average of 5Very
5Average of 8Average of 8Very
6None[Parameters of PCG Dungeon]Critical
7N.a.YesVery
Optimal results of the Blinded Experiment

The table shows very little differences from one answer to the other, especially the first 3 questions. That is because these are controll questions, simply for comparrison with the remaining questions. This is where we want to see differences, especially with questions 4 and 6. These, if answered optimally, would explain the importance of DynaGen by showing that an optimal experience can be achieved for every user by using it.

The fourth and final step is where I come in; comparing the data from all setups with each other, and drawing conclusions. Though I could get into detail now, I would much rather save it for the next paragraph.

4. Results & Conclusions

Let us haste towards the answers our participants gave, formatted for clarity. For a download of the full breakdown, click HERE.

Answers formatted for easier elaboration in the following paragraphs. Results are unchanged.

First things firtst; We can’t draw a definitive conclusion to the research question from these tests, as they are not done at a significant scale, nor are they complete (Paragraph 5 will go into more detail about the missed opportunities tied to the results).
As for participants, I wanted to include people from an array of different backgrounds and experiences when it comes to gaming, as DynaGen is about balancing the experience for whomever the player might be, and however experienced they might be. This included several students, both from HBO-ICT Game Development, as from other studies, but also two grandmothers, several parents, some siblings, a partner, and a couple of ex-colleagues.

With all this being said, we can still draw several conclusions based on the results, with additional notes in italics.
As a final note, all numbers are rounded off to 2 decimals.

4.1 Conclusions – Question 6 & 7

Let’s talk about question 6 first, as the differences here are most apparent.
Users testing Setup Nott, predictably, rarely noticed anything change. This is in line with the optimal outcome.
Users testing Setup Molly all noticed some element of change, most often the camera size. Even if we take away the camera, only one user noticed not a single thing changing (potentialy due to being balanced with the start-settings).
Speaking of; Changing the camera size is a very obvious change, given that all testers of Setup Molly noticed it. This answer was predicted; a camera in a video game rarely changes size.
Later on, when asking participants if the camera impacted gameplay, most answered “yes”, with some highlighting the increased or decreased visibility of the level.

It does beg the question why some, but not all the participants noticed the same things changing. The answer is simple, and also predictable; The algorithm I used only uses one random factor, but it changed the results significantly. For example; you get a good streak of levels going, and the enemy difficulty gets changed thanks to it. If you then rarely encounter any enemies, that change does not get noticed. This also happens when the dungeon size changes, but only by a slight amount because you’re already balanced out.

Lastly for Question 6, the rewards for picking up the coins did not have any feedback to them other than the score increase. This is a case of bad game design that influenced the results.
The same could also be said for the enemy difficulty.

To finish off this part of the conclusions; there is not enough of a noticeable patern in my implementation, which I understand. Nearly half of the users in Setup Molly did not spot any, and those that did only did so in relation to how well they were performing, not according to the changes that the game made for them. The last “arrow” in the feedback loop of DynaGen was missing, making the result underwhelming.

4.2 Conclusions – Questions 2 to 5

Something to notice immediately is that the player’s skill (question 1) influences the answers to questions 2 to 5 quite a bit; users that think of themselves as ‘experienced’ gamers, often think the game to be less difficult than those who are not ‘experienced’, which makes sense considering that ‘experienced’ gamers often like a challenge more than someone new to the hobby, but it does still mess with our conclusion a bit. It also impacted the time they spent in the game.

The scatter graphs constructed with the data sadly do not paint a conclusive picture when compared to the averages, mostly due to formatting issues (though a code I am, a spreadsheet expert I am not).

What excites me the most, however, are the averages and trends between the 4 different questions;

ScoreAverages MollyAverages Nott
Diff. lvl 33,734,29
Diff. lvl 254,912,00
Change+1,18-2,29
Skill lvl 34,826,29
Skill lvl 257,738,14
Change+2,91+1,85

Whereas the Skills in both cases increased over time, the Difficulty in setup Molly, on average, scaled up quite nicely, while the difficulty in Setup Nott dropped massively. This is somewhat in line with my optimal results that I predicted before starting the User testing phase; though I predicted the average to stay the same, it logically would have decreased in relation to the user’s own sense of difficulty. The conclusion that DynaGen did impact the game’s difficulty (or challenge) is then further cemented by looking at the progressions between all the different answers.

Here we can once again see that the Difficulty takes a massive dip for Setup Nott, as opposed to all other measured values. With the only difference being whether DynaGen is active or not, we can pressume that DynaGen does in fact mostly scale the difficulty along with the player’s skills.

4.3 Conclusions – Research Questions

Our main research goal reads: “What would a dynamic difficulty / gameplay algorithm look like in a PCG Game?”. As stated before in paragraph 3, this could be answered with ‘DynaGen’, but that does not explain why. The real answer is that a dynamic difficulty or gameplay algorithm is a looping system that records the player’s data, calculates a performance trajectory, implements changes according to the performance trajectory, and checks the results for future referencing.

There are several differences between my results and my conclusions. First, I did not include stage 4 of my conclusion – the feedback / future referencing – into my PCG Dungeon game.
Second, I did not include a trajectory in stage 2, just a performance calculation. The algorithm therefor was only able to impact gameplay on an immediate basis, instead of long-term as desired.
Third and finally, stage 3 is not the same as it should be. The algorithm must be able to change multiple parameters at the same time accrodingly, instead of completely at random as my algorithm does.
Why did none of these things exist in my game? The time-old-tale of deadlines.

As for the Secondary goals (or side goals);
1.”Does a dynamic algorithm in a PCG game balance itself out over time, regardless of the person playing?”. The answer to this is probably. Through visual inspection I can confirm that the changes that the algorithm made became smaller and smaller as the levels progressed, (much like a root function), but I did not collect the necessary data to confirm this factually. (Paragraph 5 will go into more detail on this)

2. “Would a dynamic generation algorithm work in other game genres that do not use PCG?”. As my research did not include any other games besides my own custom PCG Dungeon, I can not answer this completely accurate, and truth be told that this is a very subjective question. The short answer is yes, theoretically you could implement this into any game, as long as there is player data to be collected, and gameplay parameters to be altered. Practically, it depends. My PCG Dungeon was created in about 2 weeks, while the algorithm took up more than 3 total, including finetuning. I also created the game with this specifically in mind, so the amount of effort needed can scale drastically depending on what type of game you’re implementing DynaGen on.
Finally, players might not want a balanced experience at all. In the real world, barely anything is dynamically catered to each person, and we are very capable of living with that. For reference, look at multiplayer games, like battle royales. There might be a matchmaking system to keep things semi-fair, but a more skilled player will always have the upper hand on an amateur player if luck wasn’t involved, and this in and of itself is a feedback system that we use in daily life to learn.

5. Future & Op-Ed

Two things; Immediately coming out of writing the results, I wish that I spent 10 more minutes figuring out what results I needed from the user tests, because I am now aware of how valuable it would have been to compare the results of Stage 2 of DynaGen between setups. I would have been able to make more graphs, for a start, and could explain some of the answers using numbers (probably, this last one is not a guarantee), especially considering Side goal 1 makes express reference to this number. Consider also, that relying not only on subjective answers, but factual data for an accurate conclusion is much more feasable, and in hindsight, logical.

The second ‘thing’ is that I’ve only been able to test DynaGen in a single game, that I made myself, on a limited audience, using limited technology, in a limited span of time. The crux of my research, as with all research, is that I couldn’t go as big as I wanted to. If I had the capability to construct DynaGen into a piece of middleware that I could stick into any game that exists, I would, as Craig Tucker said, “…be so happy”.

Though I am at peace with my results, I am far from done when it comes to experimenting with the learning of the language that is video games. Not to get too romantic with my choice of words, but video games are works of art (my art-grad girlfriend confirmed to me) that, like all forms of art, can be experienced in a whole host of different ways. Most, if not all artists want their art to be experienced by as many people as possible, from as many different walks of life as possible. Developers pride themselves on having grandmother’s play their Open-World Action RPG, or having their game used in classrooms by 7 year olds to teach chemistry.

Though this started with a youtuber showing their spouse a few great games, it has resulted in a want to change the way I / we think about dynamic games, and how dynamic they can get. That result, to me, is more valuable than DynaGen.

Maybe the real treasure was the DynaGen we made along the way.

Alexander Franke, 2021

6. Sources

https://www.gamedeveloper.com/design/four-tricks-to-improve-game-balance

https://www.youtube.com/playlist?list=PLcRSafycjWFenI87z7uZHFv6cUG2Tzu9v

https://en.wikipedia.org/wiki/Blinded_experiment

https://business.edx.org/blog/why-corporate-learning-strategies-need-to-include-active-learning

Related Posts