-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A3C LSTM GA for language grounding #7
base: master
Are you sure you want to change the base?
Conversation
…with bounding boxes and visualising them and checking if we are close and focusing on an object Added sample event metadata json for easy viewing
Added scene id and current object type params. Cleared up some stuff. Fixed bug in total reward.
…f colorBounds This avoided "the colored mug changing color as you place it and it not being in color_to_object_id matrix" bug. Put everything into main
Added interaction param to env which removes actions which use objects. Removed some code.
…crowave task Saved files by episode number rather than steps.
TODO add more general dense reward for groups of objects
128 resolution
Fixed 2 bugs (time embedding index failure and action output) and allowed colour channels and higher resolution. Grayscale option to environment renamed a3c_lst_ga model file. Had to make lots of things a double though. Stopped creating environment in main file Renamed main file
Created environment guid and saved plots into folder identified by that. Added todos. Changed A3C to use 128x128 grayscale
Made asynchronous train the same.
Don't spend your life looking at numbers. Automate your experiments
…w model to deal with it. For 3 mugs: top_left_screen_x, top_left_screen_y, width, height, x, y, z, 3d distance, class. Removed legend in one lplot
Append all last actions to list. Added done break condition to wrapper random walk Made bounding crosshair function one line. Added todos
task == 6 and fixed up crosshair check (centre crosshair has to be in bounding box centre for object. Added success return value to avoid calculating done twice (can still do better refactoring) Show coloured preprocess to 128 resolution (still works in 0-1 range)
p_losses and v_losses print of average of these. Two new charts made showing these print last reward on done interaction=False MOVEMENT_REWARD_DISCOUNT up one order.
…object type Added better test of task to get microwave or cup and 2 embeddings. Unsqueeze and expand in model (will check in future if is right with sentences) many todos CUDA device to 0 but no effect of course.
Stopped rendering depth and class to get 5-10 or so FPS faster
Fixed bug with checking to checkpoint at every step Many todos
Checkpoint every 100k
… timing and benchmarking print_all and print for rank = 0. some todos, commented out eids
Adapted from RL-adventure repo. First running version, although results look odd.
There was a bug on the channel order returned from the environment by the function step. Rainbow dqn code also adapted to use the fix
Fixed conflicts and 16 files.
…-experiment-ids.txt, model.py, check_latest_plots) Moved sample event metadata to gym_ai2thor folder
algorithms/a3c/model.py
Outdated
Similar to the calculate_lstm_input_size_for_A3C function except that there are only | ||
3 conv layers and there is variation among the kernel_size, stride, the number of channels | ||
and there is no padding. Therefore these are hardcoded. Check A3C_LSTM_GA class for these numbers. | ||
Also, returns tuple representing (width, height, num_output_filters) instead of size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in its equivalent for A3C, it returns a tuple bla
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A3C's one returns the size not the tuple
height = (height - 4) // 2 + 1 | ||
height = (height - 4) // 2 + 1 | ||
|
||
return width, height, 64 | ||
|
||
def normalized_columns_initializer(weights, std=1.0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do I need column normalization? Add an explanation here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also ellaborate more on this particular type of normalization being used as opposed to other ones
return out | ||
|
||
|
||
def weights_init(m): | ||
classname = m.__class__.__name__ | ||
if classname.find('Conv') != -1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you are initializing layers differently, but what are the magic numbers like 6? Can you make a docstring for the function?
algorithms/a3c/model.py
Outdated
# Gated-Attention layers | ||
self.attn_linear = nn.Linear(self.gru_hidden_size, 64) | ||
|
||
# Time embedding layer, helps in stabilizing value prediction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain more to me about the time embedding layer and why it stabilizes value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually know "exactly" how but let's work it out here:
if episode is 1000 steps long and the the step number from 0 to 999 get's passed in as tx, and extra self.time_emb_dim dimensions are input to critic+actor to help use time to your advantage (e.g. rush when you're running out of time) or purely just to know that the min reward you can get in 1000 steps with movement reward -0.001 is -1.0). The agent can work out in sparse reward settings therefore that if there are only 3 steps left it's value prediction will be accurate towards the end and no cup in sight?
How about something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't see exactly how it can help the actor though? Should you act drastically differently with 30 steps left than with 300? Maybe if you have to choose between going to the closer cup or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yusun helped me figure this out. Value prediction isn't given time input so for the same image state, if there is 1 step left in episode or 900, the value head can't learn to predict less value for the 1st case. Added this comment:
# Time embedding layer, helps in stabilizing value prediction e.g. if only 1 step is left
# in episode not much reward can be earned vs the same image state with 500 steps left
algorithms/a3c/model.py
Outdated
x_attention = x_attention.unsqueeze(2).unsqueeze(3) | ||
x_attention = x_attention.expand(1, self.num_output_filters, self.output_height, | ||
self.output_width) | ||
assert x_image_rep.size() == x_attention.size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No asserts. Check and raise if it doesn't work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmmm, i really like the conciseness of this assert and i can add a comment?
check and raise is lame to have in a forward function? 2 lines and then you have ugly raise ValueError's in your beauitfully clear forward function
…her comments/todos in model.py, my_optim.py and train.py Removed square image talk. In README.md mentioned needing to install ViZDoom for ViZDoom command 2 lines between functions
…de clearer. Split A3C_LSTM_GA docstring in two and explained langauge grounding. self.num_output_filters = 64. Explained GA expand notation and depth slices Removed learning during training from README.md
…isode only in done, put 4 appends before done (why was it otherwise in the first place?) and removed 2nd "if done", fixed ai2thor_examples.py multithreading example fps printing removed printing of 3d euclidean
algorithms/a3c/test.py
Outdated
env.seed(args.seed + rank) | ||
|
||
model = ActorCritic(env.observation_space.shape[0], env.action_space.n, args.frame_dim) | ||
if env.task.task_has_language_instructions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a convention of tasks with language starting with "Language" or sth like that? so we would check if env.task.__name__.startswith("Language"):
That would look better IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have NaturalLanguage
at the start of all of them and NL
at the start of the config files. But a boolean is just easier and clearer no??
Booleans are MUCH clearer than startswith stuff which could be wrong. That way requires us to stick to the standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and isn't sticking to a standard a good thing? Rather than having to add 1 extra param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's added automatically in the NaturalLanguageBaseTask?
0 work, super clear, very readable, very DRY and doesn't require us to forcibly stick to a standard as we create 100+ tasks
…dded eid info to README, default reward is 1 for 5 objects removed printing of num ranbdom actions at init
…cified object types) and example to pdb_interactive py file
…m code: No more import star for vizdoom. Style changes, docs and avoid repetition
…tObjectOpened, lastObjectPickedUp, lastObjectPut, lastObjectPutReceptacle. (need to test). Added todos, time embedding explanation
|
||
from gym_ai2thor.utils import InvalidTaskParams | ||
from gym_ai2thor.task_utils import get_word_to_idx, check_if_focus_and_close_enough_to_object_type | ||
|
||
|
||
class TaskFactory: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's get rid of this task entirely by calling getattr(gym_ai2thor.tasks, config['task']['task_name'])(**config)
from AI2ThorEnv
return super(NaturalLanguageNavigateToObjectTask, self).reset() | ||
|
||
|
||
class NaturalLanguagePickUpObjectTask(NaturalLanguageBaseTask): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would restructure this to have this class inherit from both NaturalLanguageBaseTask and PickupTask. We can discuss this in slack
algorithms/a3c/test.py
Outdated
env.seed(args.seed + rank) | ||
|
||
model = ActorCritic(env.observation_space.shape[0], env.action_space.n, args.frame_dim) | ||
if env.task.task_has_language_instructions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally different issue. What happens if the task is not natural language? task.has_language_instructions will raise a ValueError
gym_ai2thor/envs/ai2thor_env.py
Outdated
closest_openable = None | ||
for obj in visible_objects: | ||
# look for closest closed receptacle to open it | ||
if obj['openable'] and obj['distance'] < distance and not obj['isopen'] and \ | ||
obj['objectType'] in self.objects['openables']: | ||
obj['objectType'] in self.allowed_objects['openables']: | ||
closest_openable = obj | ||
distance = closest_openable['distance'] | ||
if closest_openable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. If there are two objects to close and the furthest one is first in visible_objects iterable you will close both the furthest and the closest. Moreover, we are executing the close action as many times as remaining objects in visible_objects after the closest_openable. Unindenting this if block will solve the issue. The same applies to "CloseObject" action
gym_ai2thor/envs/ai2thor_env.py
Outdated
interaction_obj = closest_pickupable | ||
self.event = self.controller.step( | ||
dict(action=action_str, objectId=interaction_obj['objectId'])) | ||
elif action_str == 'OpenObject': | ||
self.event.metadata['lastObjectPickedUp'] = interaction_obj | ||
elif action_str.startswith('Open'): | ||
closest_openable = None | ||
for obj in visible_objects: | ||
# look for closest closed receptacle to open it | ||
if obj['openable'] and obj['distance'] < distance and not obj['isopen'] and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to make this more understandable. Can we take the whole condition in the if to a variable called is_closest_receptacle
and then just do if is_closest_receptacle
? Then the comment above this line won't be needed anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Analogously we can do the same for is_closest_pickupable
and is_closest_opened_receptacle
if you know what I mean
""" | ||
def __init__(self, seed=None, config_file='config_files/config_example.json', config_dict=None): | ||
def __init__(self, config_file='config_files/default_config.json', config_dict=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no seed parameter although it is used in main.py?
@@ -66,106 +73,185 @@ def __init__(self, seed=None, config_file='config_files/config_example.json', co | |||
if not self.config['pickup_put_interaction']: | |||
self.action_names = tuple([action_name for action_name in self.action_names if 'Pickup' | |||
not in action_name and 'Put' not in action_name]) | |||
if self.config.get('put_interaction'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simplify all these gets in one single for loop?
for action_name in self.action_names:
if action_name.startswith('Look') and not self.config.get('lookupdown_actions, True) or
if action_name.startswith('Move') and not self.config.get('moveupdown_actions, True):
self.action_names.remove(acton_name)
self.action_names = tuple(self.action_names)
And so on. It might not be 100% correct but you get the idea
…ean vars so if statement looks easy to read, set last vars to None
…urposes), writer problems, printing latest_sys_args, args.env, 3 objects config
.interact()