Inconsistency in action values #44

markroxor · 2018-09-24T11:16:08Z

It is expected that world.step receives a array of binary values as returned by brain.sense_act_learn which is returned by self.postprocessor.convert_to_actions but it's documentation says that it returns a A set of actions for the world, each between 0 and 1. The return of an action array of floats is inconsistent with the demands of openai's gym.
Did I miss anything @brohrer ?

The text was updated successfully, but these errors were encountered:

markroxor · 2018-09-24T12:35:51Z

Blocking #39

brohrer · 2018-09-25T12:54:49Z

At the moment, self.postprocessor.convert_to_actions() returns only 0 or 1 values, but in the future they may return float actions valued between 0 and 1. Worlds should be able to handle these, even if it is just to round them first. The documentation in becca may not all be consistent with this yet.

I'm not familiar with what Gym worlds expect. Is it very consistent across worlds? I expect there will need to be some connecting code to get them to talk smoothly. Translating actions into the expected format will probably be part of that.

markroxor · 2018-09-25T16:29:51Z

Can you please run this gist - https://gist.github.com/markroxor/c50a6bfc69da001180374a9e977ac21a (install gym first - pip install gym). The actions parameter which is fed to World.step is a float.

gym's environment expects an index of the action at each step. It can only perform one action at a time. Yes we would need some connecting code.

I think we need the api doc to proceed with the integration since the code docstrings cannot be relied upon.

brohrer · 2018-09-26T00:59:30Z

Nice work putting this connecting code together. I agree. I ran the gist and saw the same result and reached the same conclusions. I'm picturing some lines in init that, given a Gym world name, uses introspection to figure out the nature of the actions and the observations (Box and Discrete) and convert them to and from sensors and actions for becca.

An n-valued Discrete Gym would correspond to n sensors or actions. So would an n-dimensional Box.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency in action values #44

Inconsistency in action values #44

markroxor commented Sep 24, 2018

markroxor commented Sep 24, 2018

brohrer commented Sep 25, 2018

markroxor commented Sep 25, 2018

brohrer commented Sep 26, 2018

Inconsistency in action values #44

Inconsistency in action values #44

Comments

markroxor commented Sep 24, 2018

markroxor commented Sep 24, 2018

brohrer commented Sep 25, 2018

markroxor commented Sep 25, 2018

brohrer commented Sep 26, 2018