MarioBros, using Machine Learning

Written by Luis Cruz on Thursday, 18 August 2011. Posted in Academic Portfolio

Game AI | CS4731

MarioBros, using Machine Learning

The objective was to implement the Learning Track for Mario bros, that is, to implement a Machine Learning technique for Mario Bros. For this project I chose Decision Tree Learning (ID3). A Decision Tree is a series of decisions that generate an action based on a set of observations.

Description

This project was implemented over the existent IEEE Super Mario Bros infrastructure that can be found at this URL: http://www.marioai.org/ .

Objective

The objective was to implement the Learning Track for Mario bros, that is, to implement a Machine Learning technique for Mario Bros. For this project I chose Decision Tree Learning (ID3). A Decision Tree is a series of decisions that generate an action based on a set of observations. At each node in the Tree, some aspect of the game world is taken into consideration and a different branch is taken according to the decision. In the ID3 algorithm, the Decision Tree is self-generated according to a set of Training Data which is also called Examples. These examples look like a table, where columns are attributes with their values, and each row is an Example, and the last column is a decision or action. The objectives in this project are:

  1. Integration, understanding and implementation on the existent Mario Bros base code.
  2. Implementation of Data Structure for the Decision Tree.
  3. Implementation of the ID3 algorithm.
  4. Implementation of the ID3-Agent
  5. Implementation of a Recording Agent (for gathering Training Data)

ID3 – Algorithm

The ID3 implementation was successfully accomplished in this project. It compromises the Data Structure for the Decision Tree, a Data Parser to read the Training data, and the core ID3 algorithm which builds the Tree based on the Training data. This module was tested with independent data, and it accomplished to build the correct Trees.

Training Data

Getting the Training Data (Examples) was the most challenging part in this project. There are three main challenges: Where to get the data from, How to get good data and how to encode the data. In order to have a robust ID3 Tree, I had to tackle those main three areas which unfortunately were not accomplished successfully.

Training Data (Challenges)

I implemented an Agent which records the examples from human game play. Every second a few set of Examples were recorded while somebody played MarioBros. Recording this Training data had its own set of challenges:

  • Attributes – data Encoding: For this project I decided to encode more Attributes and less domain values per Attributes. That means, I used a training data set that has binary values (Yes, No) for each attribute. At the time of writing this document, I had the following attributes: OnGround, EnemyFront, EnemyFrontTop, EnemyFrontBottom, BlockFront and BlockFrontTop. Each of these attributes represent a Yes or No value (0,1) according to the World Game knowledge at a particular moment. Given the small set of Examples, I only took into consideration those attributes.
  • Training Data Origin: I implemented a Recorder Agent which records examples at rate of 3 examples per second. The main issue was the recording of lots of noise data, as an example: the attribute BFR has a value of Yes or No if there is a Block in front of Mario, but this attribute groups the front 4 spaces to Mario; which means that in the worst case scenario there will be 3 examples where BFR is Yes and the Action is only going Forward, and 1 example where the actual Action is Forward and Jump which is the Correct example, that means that noise data outvotes the actual good example, and the Decision Tree will have the going Forward as an Action which is Wrong! Unfortunately, I did not have the time to have a more advanced way to gather better Example Sets for Training Data, and one quick solution was to manually tweak the values of the Training Data, but this was tedious.
  • More Attributes = More Training Data! : As I started adding more Attributes to the Training Data, I started noticing some strange behavior in the ID3, and I found out that some decision were not having an action which most probably means that my Training Data was not robust enough, so some decisions in the tree did not have a value from the training data.

ID3 Agent

The agent in charge of executing the ID3 Tree has the following tasks:

  • Controls all core ID3 Structure, and delegates the functions of reading the Training Data
  • Delegates the construction of the ID3 Tree at startup
  • At every game iteration, it traverses the ID3 Tree and gets the Node that contains the action for that specific game state.
  • It executes the actions given by the ID3
  • The agent plays as good as the decision given by the ID3 Tree.

Major Challenges

  • Understanding the MarioBros base code
  • Acquisition of good training data: As I stated before, this data was acquired from my own game play, and the game state (Examples) were being recorded a few times per second. Mayor issues were: noise data, and not being a perfect player. I had to tweak the data manually a few times but this was a difficult task.
  • Noise data outvotes the good data: There was more noise data than actual high quality examples, resulting in wrong actions given out by the ID3. Unfortunately, this issue was not easy to fix, and it required a more detailed filtering of data.
  • Acceptable ratio for Attributes and number of examples: When adding more attributes, the need for more example data was getting more obvious. This could have been improved with a much larger set of examples.
  • Encoding of data examples and choosing the Attributes: The encoding and perfect selections of attributes are a completely problem and "art" on their own. This is the most critical part, since the agent is as good as the quality of the data and the description of the world to learn from.

Conclusion

My main objective was to create a learner agent that will traverse the game level in an efficient manner; that is with the least collisions with enemies and obstacles by using a Decision Tree generated from a Training data which was a log of examples from my own game play. Although the implementation of the ID3 structure was successful, the Training data has many flaws: noise data, few examples, and not enough attributes; and that cause that the Agent performed very poorly.

I chose ID3 because at the moment it seemed to be the most straightforward and logical choice for a game that require a lot of decisions; moreover, my inexperience in Machine Learning made me decide to use this technique over others such as Neural Networks. Unfortunately, the challenges and issues progressively started to emerge while I was making small progress in the project, and given the time constraints, there was not going back. ID3 for this type of game requires a lot of training data with too many attributes and that has been the greatest challenge and obstacle in this project.

This agent could have been improved by using a much larger set of high quality Training Data perhaps coming from an almost perfect Mario Agent, along with an increased set of Attributes for the Examples.

Social Bookmarks

About the Author

Luis Cruz

I am pursuing my MS in Computer Science at Georgia Institute of Technology (GaTech) with emphasis in Computer Graphics. I received by BS in Computer Science at GaTech in 2009 with specialization in Software Engineering and Computer Graphics.

Leave a comment

You are commenting as guest.