How do I implement a custom environment in keras-rl / OpenAI GYM?

Question

How do I implement a custom environment in keras-rl / OpenAI GYM?

I am a complete newbie to reinforcement training and have been looking for a framework / module to navigate this treacherous terrain with ease. In my search I came across two modules keras-rl and OpenAI GYM.

I can get both of these two working on examples that they shared on their WIKIs, but they come with predefined environments and have little to no information on how to set up my own environment.

I would be very grateful if someone could point me to a tutorial or just explain to me how I can set up a non-gaming environment?

+3

reinforcement-learning keras openai-gym keras-rl

Manipal king June 10. '17 at 3:38

source to share

1 answer

Andriy Lazorenko · Accepted Answer · 2017-06-21T16:00:43+0000

I've been working on these libraries for some time now and can share some of my experiments.

Let's take a text environment as an example of a custom environment first, https://github.com/openai/gym/blob/master/gym/envs/toy_text/hotter_colder.py

For a custom environment, you need to define a couple of things.

Constructor__init__ method
Act
Observational space (see https://github.com/openai/gym/tree/master/gym/spaces for all available spaces in the gym (it's kind of a data structure))
_seed (not sure if it's required)
_step method that takes an action as a parameter and returns an observation (state after action), a reward (to transition to a new observing state), executed (boolean flag), and some additional information.
_reset method that implements the logic to start the sequence again.

Optionally, you can create a _render method with something like

 def _render(self, mode='human', **kwargs):
        outfile = StringIO() if mode == 'ansi' else sys.stdout
        outfile.write('State: ' + repr(self.state) + ' Action: ' + repr(self.action_taken) + '\n')
        return outfile

In addition, for better code flexibility, you can define the logic of your reward in the _get_reward method and changes in the observation space from the action in the _take_action method.

How do I implement a custom environment in keras-rl / OpenAI GYM?

More articles: