I’ve been looking into machine learning recently, partly for my own interest and partly because I suspect it will see a lot of use in my industry in the coming years. There are a couple of decent resources that I’ve been using to get started. There are free Alison courses on data science and machine learning that I am working my way through – I’ve found these good as an introduction to the general theory of machine learning, though the couple I’ve gone through haven’t had much in the way of practical examples. However, on the practical side, this page – Your First Machine Learning Project in Python Step-By-Step – is a great tutorial on running your first machine learning algorithm in Python. It’s very quick (it took me longer to track down the module versions I needed than it did to work through the code) and gives a nice introduction to some basic tools. Having worked through the tutorial just mentioned, I wanted to try it with some different data in order to play around with it. However, real world data, in an appropriate format for machine learning, is not something you just find sitting around, so I invented a problem that didn’t really exist – rolling an acceptable set of D&D ability scores.
A quick primer for non-D&D players:
When creating a character in D&D you roll six ability scores between 3 and 18. In early editions, this was done by rolling three six-sided dice and adding them together, then repeating that five more times – generally in early editions, the results did not have a huge effect on the game, so it didn’t matter if you didn’t roll particularly well. In later editions, having a couple of high scores became a lot more important and most groups switched to rolling four dice and dropping the lowest, to generate slightly better scores. There is also a general understanding that a legitimately bad set of scores can be re-rolled, though where people draw the line on a “bad set” will vary a lot.
So… could I use a machine learning algorithm to teach my computer what a bad/good set of D&D attributes looks like, so that it can then roll random sets of scores, until it gets one that is acceptable? Short answer, yes; long answer… keep reading.
First, I needed a training set, so I generated 1000 random sets of ability scores (using Python, but you could probably do it in Excel easily enough), ordering each one from highest to lowest (both to help the machine learning and to help me evaluate them). I then went through all of them and classified them as either acceptable or not acceptable. I weeded out the obviously useless sets with a bit of filtering (sets with nothing higher than a 13 or with multiple very low sores), but then I just went through and assigned each one a classification. This did take a little time, but not as long as you’d assume – I was very much going on an instant gut feeling for each one, proven by the fact that there were a handful of cases where I classified the same set of ability scores as both acceptable and unacceptable at different points in the list.
With my training set in place, I ran through much of the same code as in the tutorial mentioned above, obviously tweaking for the fact that I have six variables, rather than four. Unlike the tutorial, the SVC algorithm showed the best potential accuracy (I don’t know what SVC is… that’s something I hope to discover at some point) and when I ran it against the testing set, it got a 93% overall accuracy score.
From the point of view of my ultimate goal, the precision on the acceptable results is the most interesting item here. If the algorithm misses some acceptable sets (the recall score), I’ll never see them anyway – the important number is the probability that the set of results it does spit out is an acceptable one. 86% is not as good as 93%, but is certainly better than a straight up random set, which has at least a 20% chance of being no good, just based on having a max score of 13 or at least one score of 5 or less. It’s also likely that the sets being miscategorised are the borderline cases that aren’t too terrible, just not inspiring.
Having taught it to recognise acceptable sets of ability scores, the final step is simple enough; a little bit of Python to generate random sets of ability scores, test them using the machine learning algorithm and then print out the first that passes.
I’m not going to bog this post down with syntax… post a comment if you have questions on my specific code implementation.
A couple of final points…
As a test, I produced a stripped down training set, with only the highest two scores and the lowest. I then classified them with a simple rule that a set was acceptable if both the highest two were at least 16 and the lowest was no less than 8. Interestingly, despite the fact that I could write a simple if/then clause that could categorise the sets perfectly, only one of the various machine learning algorithms mapped the training set with 100% accuracy, while a couple of others got very high 90s. The LR algorithm (whatever that is) only hit 88% accuracy, which seems very poor under the circumstances. Clearly some of these algorithms are not designed to identify something so clean cut, but presumably do very well on dirtier data.
Lastly, despite the theory that the program will generate a set of ability scores that should be acceptable to me, there is nothing stopping me from rejecting the result and just running it again. With this in mind, I did consider adding a random seed, based on the date, that would mean if I ran it multiple times on the same day, I’d always get the same result. However, at this point I really am creating issues that don’t exist, so I think we’ll finish here.