A3C LSTM GA for language grounding #7

beduffy · 2019-02-23T12:36:42Z

Added A3C_LSTM_GA model from DeepRL-Grounding by Devendra Chaplot
Contains cozmo_env PR
Added checkpointing and experiment IDs of models
Added tensorboardX which outputs to experiment folder with config+argparse settings
Added avg_episode_return
Added random actions on init
Added 4 natural language tasks with variants specified by config files (NaturalLanguageLookAtObjectTask, NaturalLanguageNavigateToObjectTask, NaturalLanguagePickUpObjectTask, NaturalLanguagePickUpMultipleObjectTask)
Added test case for 2 natural language tasks
Added if statements to decide which model to use depending on whether task has natural language or not and therefore chooses A3C or A3C_LSTM_GA
Added 3 config files and renamed main config to default_config.json
Created task_utils.py for visualisation, preprocessing and reward functions
Fixed config overwrite bug
Fixed many other bugs I've forgotten about (e.g. one test case was broken)
Renamed to PickUpTask from PickUp
Added ability to remove lookupdown_actions and moveupdown_actions and put action alone (for pickup tasks but not put needed)
Added ai2thor_examples.py for previous test cases that were purely just ai2thor run examples
Added inbuilt_interactive_mode.py for script for exploring ai2thor with unity build path and .interact()
Added pdb_interactive_and_check_bbox_reward_functions.py for testing reward functions and printing bboxes using pdb debugging
VizDoom integration as originally with DeepRL-Grounding
random scene reset choice and setting from config
ASCII art figures for architectures within model.py
Documentation

…with bounding boxes and visualising them and checking if we are close and focusing on an object Added sample event metadata json for easy viewing

Added scene id and current object type params. Cleared up some stuff. Fixed bug in total reward.

…f colorBounds This avoided "the colored mug changing color as you place it and it not being in color_to_object_id matrix" bug. Put everything into main

Added interaction param to env which removes actions which use objects. Removed some code.

…crowave task Saved files by episode number rather than steps.

TODO add more general dense reward for groups of objects

128 resolution

Fixed 2 bugs (time embedding index failure and action output) and allowed colour channels and higher resolution. Grayscale option to environment renamed a3c_lst_ga model file. Had to make lots of things a double though. Stopped creating environment in main file Renamed main file

Created environment guid and saved plots into folder identified by that. Added todos. Changed A3C to use 128x128 grayscale

Made asynchronous train the same.

Don't spend your life looking at numbers. Automate your experiments

…w model to deal with it. For 3 mugs: top_left_screen_x, top_left_screen_y, width, height, x, y, z, 3d distance, class. Removed legend in one lplot

Append all last actions to list. Added done break condition to wrapper random walk Made bounding crosshair function one line. Added todos

task == 6 and fixed up crosshair check (centre crosshair has to be in bounding box centre for object. Added success return value to avoid calculating done twice (can still do better refactoring) Show coloured preprocess to 128 resolution (still works in 0-1 range)

p_losses and v_losses print of average of these. Two new charts made showing these print last reward on done interaction=False MOVEMENT_REWARD_DISCOUNT up one order.

…object type Added better test of task to get microwave or cup and 2 embeddings. Unsqueeze and expand in model (will check in future if is right with sentences) many todos CUDA device to 0 but no effect of course.

Stopped rendering depth and class to get 5-10 or so FPS faster

Fixed bug with checking to checkpoint at every step Many todos

Checkpoint every 100k

… timing and benchmarking print_all and print for rank = 0. some todos, commented out eids

Adapted from RL-adventure repo. First running version, although results look odd.

There was a bug on the channel order returned from the environment by the function step. Rainbow dqn code also adapted to use the fix

Fixed conflicts and 16 files.

…-experiment-ids.txt, model.py, check_latest_plots) Moved sample event metadata to gym_ai2thor folder

algorithms/a3c/model.py

fernandotorch · 2019-03-09T15:35:28Z

algorithms/a3c/model.py

+    Similar to the calculate_lstm_input_size_for_A3C function except that there are only
+    3 conv layers and there is variation among the kernel_size, stride, the number of channels
+    and there is no padding. Therefore these are hardcoded. Check A3C_LSTM_GA class for these numbers.
+    Also, returns tuple representing (width, height, num_output_filters) instead of size


As in its equivalent for A3C, it returns a tuple bla

A3C's one returns the size not the tuple

fernandotorch · 2019-03-09T15:36:39Z

algorithms/a3c/model.py

+    height = (height - 4) // 2 + 1
+    height = (height - 4) // 2 + 1
+
+    return width, height, 64

 def normalized_columns_initializer(weights, std=1.0):


Why do I need column normalization? Add an explanation here

Also ellaborate more on this particular type of normalization being used as opposed to other ones

fernandotorch · 2019-03-09T15:42:24Z

algorithms/a3c/model.py

    return out

-
 def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:


I see that you are initializing layers differently, but what are the magic numbers like 6? Can you make a docstring for the function?

algorithms/a3c/model.py

fernandotorch · 2019-03-09T16:06:45Z

algorithms/a3c/model.py

+        # Gated-Attention layers
+        self.attn_linear = nn.Linear(self.gru_hidden_size, 64)
+
+        # Time embedding layer, helps in stabilizing value prediction


Could you explain more to me about the time embedding layer and why it stabilizes value?

I don't actually know "exactly" how but let's work it out here:

if episode is 1000 steps long and the the step number from 0 to 999 get's passed in as tx, and extra self.time_emb_dim dimensions are input to critic+actor to help use time to your advantage (e.g. rush when you're running out of time) or purely just to know that the min reward you can get in 1000 steps with movement reward -0.001 is -1.0). The agent can work out in sparse reward settings therefore that if there are only 3 steps left it's value prediction will be accurate towards the end and no cup in sight?

How about something like that?

i don't see exactly how it can help the actor though? Should you act drastically differently with 30 steps left than with 300? Maybe if you have to choose between going to the closer cup or not

Yusun helped me figure this out. Value prediction isn't given time input so for the same image state, if there is 1 step left in episode or 900, the value head can't learn to predict less value for the 1st case. Added this comment:

# Time embedding layer, helps in stabilizing value prediction e.g. if only 1 step is left # in episode not much reward can be earned vs the same image state with 500 steps left

fernandotorch · 2019-03-09T16:08:40Z

algorithms/a3c/model.py

+        x_attention = x_attention.unsqueeze(2).unsqueeze(3)
+        x_attention = x_attention.expand(1, self.num_output_filters, self.output_height,
+                                         self.output_width)
+        assert x_image_rep.size() == x_attention.size()


No asserts. Check and raise if it doesn't work

hmmmm, i really like the conciseness of this assert and i can add a comment?
check and raise is lame to have in a forward function? 2 lines and then you have ugly raise ValueError's in your beauitfully clear forward function

…her comments/todos in model.py, my_optim.py and train.py Removed square image talk. In README.md mentioned needing to install ViZDoom for ViZDoom command 2 lines between functions

…de clearer. Split A3C_LSTM_GA docstring in two and explained langauge grounding. self.num_output_filters = 64. Explained GA expand notation and depth slices Removed learning during training from README.md

…isode only in done, put 4 appends before done (why was it otherwise in the first place?) and removed 2nd "if done", fixed ai2thor_examples.py multithreading example fps printing removed printing of 3d euclidean

fernandotorch · 2019-03-10T08:26:55Z

algorithms/a3c/test.py

    env.seed(args.seed + rank)

-    model = ActorCritic(env.observation_space.shape[0], env.action_space.n, args.frame_dim)
+    if env.task.task_has_language_instructions:


Could we have a convention of tasks with language starting with "Language" or sth like that? so we would check if env.task.__name__.startswith("Language"): That would look better IMO

I do have NaturalLanguage at the start of all of them and NL at the start of the config files. But a boolean is just easier and clearer no??
Booleans are MUCH clearer than startswith stuff which could be wrong. That way requires us to stick to the standard.

and isn't sticking to a standard a good thing? Rather than having to add 1 extra param

it's added automatically in the NaturalLanguageBaseTask?
0 work, super clear, very readable, very DRY and doesn't require us to forcibly stick to a standard as we create 100+ tasks

…dded eid info to README, default reward is 1 for 5 objects removed printing of num ranbdom actions at init

…cified object types) and example to pdb_interactive py file

…m code: No more import star for vizdoom. Style changes, docs and avoid repetition

…tObjectOpened, lastObjectPickedUp, lastObjectPut, lastObjectPutReceptacle. (need to test). Added todos, time embedding explanation

fernandotorch · 2019-03-22T23:25:39Z

gym_ai2thor/tasks.py


 from gym_ai2thor.utils import InvalidTaskParams
+from gym_ai2thor.task_utils import get_word_to_idx, check_if_focus_and_close_enough_to_object_type


 class TaskFactory:


let's get rid of this task entirely by calling getattr(gym_ai2thor.tasks, config['task']['task_name'])(**config) from AI2ThorEnv

fernandotorch · 2019-03-22T23:33:31Z

gym_ai2thor/tasks.py

+        return super(NaturalLanguageNavigateToObjectTask, self).reset()
+
+
+class NaturalLanguagePickUpObjectTask(NaturalLanguageBaseTask):


I would restructure this to have this class inherit from both NaturalLanguageBaseTask and PickupTask. We can discuss this in slack

fernandotorch · 2019-03-22T23:36:46Z

algorithms/a3c/test.py

    env.seed(args.seed + rank)

-    model = ActorCritic(env.observation_space.shape[0], env.action_space.n, args.frame_dim)
+    if env.task.task_has_language_instructions:


Totally different issue. What happens if the task is not natural language? task.has_language_instructions will raise a ValueError

fernandotorch · 2019-03-22T23:43:53Z

gym_ai2thor/envs/ai2thor_env.py

                closest_openable = None
                for obj in visible_objects:
                    # look for closest closed receptacle to open it
                    if obj['openable'] and obj['distance'] < distance and not obj['isopen'] and \
-                            obj['objectType'] in self.objects['openables']:
+                            obj['objectType'] in self.allowed_objects['openables']:
                        closest_openable = obj
                        distance = closest_openable['distance']
                    if closest_openable:


This is wrong. If there are two objects to close and the furthest one is first in visible_objects iterable you will close both the furthest and the closest. Moreover, we are executing the close action as many times as remaining objects in visible_objects after the closest_openable. Unindenting this if block will solve the issue. The same applies to "CloseObject" action

fernandotorch · 2019-03-22T23:47:45Z

gym_ai2thor/envs/ai2thor_env.py

                    interaction_obj = closest_pickupable
                    self.event = self.controller.step(
                        dict(action=action_str, objectId=interaction_obj['objectId']))
-            elif action_str == 'OpenObject':
+                    self.event.metadata['lastObjectPickedUp'] = interaction_obj
+            elif action_str.startswith('Open'):
                closest_openable = None
                for obj in visible_objects:
                    # look for closest closed receptacle to open it
                    if obj['openable'] and obj['distance'] < distance and not obj['isopen'] and \


I'd like to make this more understandable. Can we take the whole condition in the if to a variable called is_closest_receptacle and then just do if is_closest_receptacle? Then the comment above this line won't be needed anymore

Analogously we can do the same for is_closest_pickupable and is_closest_opened_receptacle if you know what I mean

fernandotorch · 2019-03-22T23:50:36Z

gym_ai2thor/envs/ai2thor_env.py

    """
-    def __init__(self, seed=None, config_file='config_files/config_example.json', config_dict=None):
+    def __init__(self, config_file='config_files/default_config.json', config_dict=None):


there is no seed parameter although it is used in main.py?

fernandotorch · 2019-03-23T00:07:46Z

gym_ai2thor/envs/ai2thor_env.py

@@ -66,106 +73,185 @@ def __init__(self, seed=None, config_file='config_files/config_example.json', co
        if not self.config['pickup_put_interaction']:
            self.action_names = tuple([action_name for action_name in self.action_names if 'Pickup'
                                       not in action_name and 'Put' not in action_name])
+        if self.config.get('put_interaction'):


Can we simplify all these gets in one single for loop?

for action_name in self.action_names: if action_name.startswith('Look') and not self.config.get('lookupdown_actions, True) or if action_name.startswith('Move') and not self.config.get('moveupdown_actions, True): self.action_names.remove(acton_name) self.action_names = tuple(self.action_names)

And so on. It might not be 100% correct but you get the idea

…ean vars so if statement looks easy to read, set last vars to None

…urposes), writer problems, printing latest_sys_args, args.env, 3 objects config

beduffy and others added 30 commits September 30, 2018 00:13

Added bounding_boxes_and_test_reward.py which allows experimentation …

4554cd6

…with bounding boxes and visualising them and checking if we are close and focusing on an object Added sample event metadata json for easy viewing

Added microwave or mug navigation task to wrapper

afd03a3

Added scene id and current object type params. Cleared up some stuff. Fixed bug in total reward.

Fixed bounding boxes to use instance_detections2D attribute instead o…

0c84638

…f colorBounds This avoided "the colored mug changing color as you place it and it not being in color_to_object_id matrix" bug. Put everything into main

Fixed "class and object info not showing in metadata on reset of env"

12a8e9f

Added interaction param to env which removes actions which use objects. Removed some code.

Started monitoring episode lengths (3rd figure) and time taken for mi…

8e16277

…crowave task Saved files by episode number rather than steps.

Added dense reward option for microwave task

81c7069

TODO add more general dense reward for groups of objects

Added Devendra Chaplot's DeepRL grounding A3C model

66d5915

Added natural language option to environment with 1 task

072e754

128 resolution

Put all plots into utils common function and used for both models.

1d9b78c

Created environment guid and saved plots into folder identified by that. Added todos. Changed A3C to use 128x128 grayscale

Added checkpointing and resuming in experiment folder

b86e5a1

Made asynchronous train the same.

Added script to read latest 3 plots saved in png

a2ce6e8

Don't spend your life looking at numbers. Automate your experiments

Added entity features into state information. 27 features. Created ne…

94d193c

…w model to deal with it. For 3 mugs: top_left_screen_x, top_left_screen_y, width, height, x, y, z, 3d distance, class. Removed legend in one lplot

Added two sentence tasks (turn left or turn right 3 times)

75f64d7

Append all last actions to list. Added done break condition to wrapper random walk Made bounding crosshair function one line. Added todos

Removed reward clamp Saved action probabilities plot every episode

1a5bd49

Fixed massive bug of not loading new sentence and much more

9a02df5

p_losses and v_losses print of average of these. Two new charts made showing these print last reward on done interaction=False MOVEMENT_REWARD_DISCOUNT up one order.

Added check for empty arrays for plotting newer graphs

fb1ddcc

Stopped rendering depth and class to get 5-10 or so FPS faster

Tried LR 0.0003, Added experiments text file, saved episode number

b212284

Fixed bug with checking to checkpoint at every step Many todos

Added checkpoint and first experiment folder. Fixed latest plot blug.

a5cebf2

Checkpoint every 100k

Added multi-threading, save total length/number of episodes, counter,…

9977a2a

… timing and benchmarking print_all and print for rank = 0. some todos, commented out eids

Fixed checkpoint_counter for new experiment, some todos and comments

7b8b87f

Added .gitignore for experiments

cb25429

Add Rainbow DQN code

a43ea64

Adapted from RL-adventure repo. First running version, although results look odd.

Some comments, style and prognostications

324a0f0

Merge branch 'A3C_barebones' into Rainbow-DQN

1e64989

Fix channel dimension from state

44da993

There was a bug on the channel order returned from the environment by the function step. Rainbow dqn code also adapted to use the fix

Merge branch 'A3C_LSTM_GA_two_sentence' into A3C_LSTM_GA_barebones

b419713

Fixed conflicts and 16 files.

Added A3C GA model to model.py and removed 3 files I won't need (good…

3acecaf

…-experiment-ids.txt, model.py, check_latest_plots) Moved sample event metadata to gym_ai2thor folder