I am building this environment primarily for my Reinforcement Learning research. Purpose of this python module is to enable creation and simulation of Markov Decision Processes.
The environment is accessible through the OpenAI gym wrapper. An example to use it as follows.
import gym
import mdp_environment
env = gym.make("mdp-v0")
env.reset()
for _ in range(1000):
_, _, done, _ = env.step(env.action_space.sample())
if done:
env.reset()
There are two custom MDP environments with the following details.
The MDP chain looks like this. For both the MPDs, the parameter N and p are adjustible.