Guess the Behaviour

Summary Students are provided with a Markov Decision Process (MDP) with unknown details. Their task is to approximate the structure of the MDP and solve it using any method of their choice to obtain one of the optimal policies. They are also required to generate a policy directly using Q-Learning, bypassing the need to explicitly construct the underlying MDP structure. The focus of this assignment is to provide students with a non-trivial problem and challenge them to solve it using various methods.
Topics MDPs, Value Iteration, Q-Learning
Audience Suitable for undergraduate or graduate students enrolled in introductory courses on Artificial Intelligence or courses related to Reinforcement Learning.
Difficulty Moderately challenging for undergraduates with a background in Python programming. Students are expected to complete the assignment within two weeks.
Strengths This assignment provides hands-on experience with the basics of MDPs, Reinforcement Learning, and Q-Learning, while also allowing students the opportunity to solve the problem using their own algorithms. It is structured to promote creativity and exploration. The assignment is implemented entirely in Python, and there is a version that integrates with the Gymnasium framework.
Weaknesses The main drawback is that the assignment cannot be easily auto-graded. Additionally, it assumes a solid foundation in Python and related libraries.
Dependencies A strong background in Python programming is recommended. The assignment is compatible with Python 3+ and can be run on most single-board computers, ensuring broad accessibility. The Gymnasium version is fully compatible with Python 3.9+.
Variants This assignment is adaptable. The general version covers all the aspects of the problem, but there are several modifications that can focus on specific tasks. For example:
  • Compare performance based on initial values in the Q-Table (Exploration vs. Exploitation).
  • Compare policy gradient methods with Q-Learning.
  • Other possible variations.

Assignment handout template

Source code for the environment

File to be edited by students