Deep Interval Temporal Relationship Learner is an architecture for identifying and learning from temporal features as they are expressed in the video of human-led sequential tasks.
We release the PyTorch code of Deep Interval Temporal Relationship Learner (D-ITR-L)
The code is built with following libraries:
- Python 3.6 or later
- PyTorch 1.7 or later
- PyTorch Geometric
- scikit-learn
- openCV 4.5 or later
This work has been designed to interface with three datasets, one policy learning architecture (Block Stacking) and two activity recognition datasets (Furniture Construction and Recipe Following). We provided the frame by frame versions of the datasets here the source videos can be downloaded at the links in the following table.
Dataset | Source | Frames |
---|---|---|
Block Stacking (Fixed Timing) | link | link |
Block Stacking (Variable Timing) | link | link |
IKEA Furniture Assembly | link | link |
Crepe Recipe Following (Sub-Actions) | -- | link |
Crepe Recipe Following (Full-Recipes) | link | link |
For reproducibility we provide several trained models. When training the backbone models we generally leveraged pre-trained features from architectures exposed to the ImageNet dataset. PyTorch provides internal models for the VGG-16 and Wide ResNet architectures but the other two of the backbone models investigated in this work (Temporal Shift Module and I3D) leverage external models. We provide links to where those datasets can be downloaded from.
Backbone Model | Source |
---|---|
Temporal Shift Module | link |
I3D | link |
We provide zip files containing the trained spatial and temporal features of models investigated in this work.
Dataset | Trained Models |
---|---|
Block Stacking (Fixed Timing) | link |
Block Stacking (Variable Timing) | link |
IKEA Furniture Assembly | link |
Crepe Recipe Following (Sub-Actions) | link |
Crepe Recipe Following (Full-Recipes) | link |
The implementation of this model is distributed into several sections:
- Training of the backbone model to identify significant spatial features present in the dataset
- Training of the temporal inference architectures to learn temporal representations from the identified spatial features.
- Evaluation of the trained models
All executables can be run with the --help
flag to pull up a list of legal parameters.
You will likely need to update the directory paths listed on lines 18-20 of parameter_parser.py
to point
to the directory where your datasets are located and the pretrained source files for fine-tuning the I3D and TSM models. When running
the code for the first time it should be run with the --gen
flag to generate the modified files used by the system.
The backbone models can be trained by running the execute_backbone.py
file with a specific backbone model identifier and application name.
The available options are listed when run with the --help
command. Included in the following code
is an example application for running training the Temporal Shift Module (tsm) on the IKEA furniture construction
dataset (ikea). The code will perform a grid search generating models that use the provided backbone model and dataset over different bottleneck sizes
Models will be generated at the bottleneck sizes of 8, 16, 32, and 64. Which can later be investigated and the best
model can be used when conducting temporal inference. In the example I use the --repeat
flag to fine-tune the model several times to sample
different architectures by leveraging deep learnings inherent stochasticity.
# Running the program
python3.6 execute_backbone.py <backbone_model_id> --application <application_name> --repeat <number_of_repitions>
# Example Execution
python3.6 execute_backbone.py tsm --application ikea --repeat 3
Trained models will be placed in folders of the type saved_models_<bottleneck_size>\c_backbone_<backbone_model_id>_<repetition_identifier>
.
An example of the directories generated is presented below.
# example directory structure
saved_models_8/c_tsm_backbone_0
saved_models_8/c_tsm_backbone_1
saved_models_8/c_tsm_backbone_2
saved_models_16/c_tsm_backbone_0
...
saved_models_64/c_tsm_backbone_2
After training the models they can be evaluated using the following code. The application type is either 'c' for a classification task or 'pl' for policy learning. Training the backbone model should always be done as a classification task.
# Running the program
python3.6 analysis/model_analyze.py <application_type> <model_directory>
# Example Execution
python3.6 analysis/model_analyze.py c saved_models_8/c_vgg_backbone_0
After a model has been selected to perform spatial feature extraction for the task particular application it should be moved
to a new directory. Create a directory titled base_models_<application_name>
. Given our example we
would move setup our directories as follows: base_models_ikea/c_vgg_backbone_0
. This
directory name should be updated in the parameter_parser.py
file. Line 107 captures the
TSM directory name used for the 'ikea' application. Both it and the bottleneck size value should be updated
appropriately.
Once the backbone model has been established it is time to train the inference models. In our work we investigated four approaches:
- no temporal inference using a linear model (linear).
- a recurrent neural network: the long short-term memory cell (lstm)
- a convolution over time base approach using Temporal Convolutional Network (tcn)
- Deep Interval Temporal Relationship Network (ditrl)
Training of these models is accomplished through the following command:
# Running the program
python3.6 execute.py <application_type> <backbone_model_name> <inference_approach> --application <application_name>
# Example Execution
python3.6 execute.py c tsm ditrl --application ikea
This code uses the trained features of the fixed backbone model to identify feature presence in the input video. This information is then passed to one of the temporal inference approaches. When conducting inference using the temporal model the architecture will generate intermediary files (IADs) in the directory where the dataset is located in order to expedite learning. If using D-ITR-L for inference then graph files will be saved in the same location. Be forewarned that these files can be quite large.
The trained models are placed in a directory titled:
saved_models_<application>/<application_type>_<inference_approach>_<run_id>
. This file can be interrogated using the same model_analysis
code as before.
The execution code for policy learning is similar to that used for classification with the exception being the use of the 'pl' application_type.
# Example Execution
python3.6 execute.py pl tsm ditrl --application block_construction_timed
Some of the code in this work was leveraged from other GitHub sources namely the PyTorch implementation of I3D, the Temporal Shift Module, and the Temporal Convolutional Network