Summary | Students are provided with a Markov Decision Process (MDP) with unknown details. Their task is to approximate the structure of the MDP and solve it using any method of their choice to obtain one of the optimal policies. They are also required to generate a policy directly using Q-Learning, bypassing the need to explicitly construct the underlying MDP structure. The focus of this assignment is to provide students with a non-trivial problem and challenge them to solve it using various methods. |
---|---|
Topics | MDPs, Value Iteration, Q-Learning |
Audience | Suitable for undergraduate or graduate students enrolled in introductory courses on Artificial Intelligence or courses related to Reinforcement Learning. |
Difficulty | Moderately challenging for undergraduates with a background in Python programming. Students are expected to complete the assignment within two weeks. |
Strengths | This assignment provides hands-on experience with the basics of MDPs, Reinforcement Learning, and Q-Learning, while also allowing students the opportunity to solve the problem using their own algorithms. It is structured to promote creativity and exploration. The assignment is implemented entirely in Python, and there is a version that integrates with the Gymnasium framework. |
Weaknesses | The main drawback is that the assignment cannot be easily auto-graded. Additionally, it assumes a solid foundation in Python and related libraries. |
Dependencies | A strong background in Python programming is recommended. The assignment is compatible with Python 3+ and can be run on most single-board computers, ensuring broad accessibility. The Gymnasium version is fully compatible with Python 3.9+. |
Variants |
This assignment is adaptable. The general version covers all the aspects of the problem, but there are several modifications that can focus on specific tasks. For example:
|
Assignment handout template
Source code for the environment
File to be edited by students