-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistency in action values #44
Comments
Blocking #39 |
At the moment, self.postprocessor.convert_to_actions() returns only 0 or 1 values, but in the future they may return float actions valued between 0 and 1. Worlds should be able to handle these, even if it is just to round them first. The documentation in becca may not all be consistent with this yet. I'm not familiar with what Gym worlds expect. Is it very consistent across worlds? I expect there will need to be some connecting code to get them to talk smoothly. Translating actions into the expected format will probably be part of that. |
Can you please run this gist - https://gist.github.com/markroxor/c50a6bfc69da001180374a9e977ac21a (install gym first - gym's environment expects an index of the action at each step. It can only perform one action at a time. Yes we would need some connecting code. I think we need the api doc to proceed with the integration since the code docstrings cannot be relied upon. |
Nice work putting this connecting code together. I agree. I ran the gist and saw the same result and reached the same conclusions. I'm picturing some lines in init that, given a Gym world name, uses introspection to figure out the nature of the actions and the observations (Box and Discrete) and convert them to and from sensors and actions for becca. An n-valued Discrete Gym would correspond to n sensors or actions. So would an n-dimensional Box. |
It is expected that
world.step
receives a array of binary values as returned bybrain.sense_act_learn
which is returned byself.postprocessor.convert_to_actions
but it's documentation says that it returns a A set of actions for the world, each between 0 and 1. The return of an action array of floats is inconsistent with the demands of openai's gym.Did I miss anything @brohrer ?
The text was updated successfully, but these errors were encountered: