A baseline for end-to-end RGB render-and-compare pose estimation

A repository for render-and-compare machine learning pose estimation using a known CAD model, without the use of depth measurements. A neural network compares a real image and an object rendered under an initial pose estimate. The neural network iteravely predicts pose updates to the model until the rendering of the object matches the real image.

Based on the work from DeepIM and CosyPose:

DeepIM: https://arxiv.org/abs/1804.00175

CosyPose: https://arxiv.org/abs/2008.08465
https://github.com/ylabbe/cosypose

Snippets of code are copied from the CosyPose github. Copied functions contains an explicit comment about the source.

Pre-requisites

1: Install dependencies

Install torch and cuda from https://pytorch.org/get-started/locally/
pip install -r requirements.txt

2. Image dataset

Create an image dataset from https://github.com/olaals/datasets-rgb-pose-estimation
Create a symbolic link or copy the dataset to

3. 3D dataset

Create a symbolic link or copy the same 3D model dataset used to create the image dataset to

Training and testing

1: Create a config file

Create a config file in configs
An example config file is given in example_config.py

2: Training a model

To train a model, run the following command

python train_model.py configs/example_config.py

The training and validation loss may be tracked with tensorboard with

tensorboard --logdir tensorboard

Additional visualizations are stored in logdir

3: Testing a model

python test_model.py configs/example_config.py

The results are stored in logdir

Technical details

Training process overview

The exact training process depends on the configuration that is set in the config files in the configs directory, but the overall pipeline is shown below

The general pipeline includes

A renderer that produces two images of the same object, where the initial guess of the object is slightly off.
These images are concatenated and used as the input to a convolutional neural network.
The CNN tries to estimate either a 6D or 9D representation of rotation, and pixel translation in x and y direction, as well as a depth parameter vz.
The output of the CNN is passed onto a rotation representation function, which calculates a valid rotation matrix
The pixel translation output of the CNN is converted to translation in Euclidean space.
Together, the rotation matrix and translation form a transformation matrix delta_T, which updates the current estimate of T_CO with T_CO_new = delta_T*T_CO
A loss function loss(T_CO_new, T_CO_gt) determines a number which represents the deviation between T_CO_new and T_CO_gt

Frames

The code in the repository uses shorthand notation for the transformation matrix describing the rotation and translation between frames. The image below shows the shorthand notations used, where T_CO is of particular importance.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
configs		configs
docs		docs
img-datasets		img-datasets
logdir		logdir
model3d-datasets		model3d-datasets
models		models
old-configs		old-configs
tensorboard		tensorboard
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
data_loaders.py		data_loaders.py
image_dataloaders.py		image_dataloaders.py
loss.py		loss.py
parser_config.py		parser_config.py
renderer.py		renderer.py
requirements.txt		requirements.txt
rotation_representation.py		rotation_representation.py
se3_helpers.py		se3_helpers.py
test_model.py		test_model.py
train_model.py		train_model.py
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A baseline for end-to-end RGB render-and-compare pose estimation

Pre-requisites

1: Install dependencies

2. Image dataset

3. 3D dataset

Training and testing

1: Create a config file

2: Training a model

3: Testing a model

Technical details

Training process overview

Frames

About

Releases

Packages

Languages

License

olaals/end-to-end-RGB-pose-estimation-baseline

Folders and files

Latest commit

History

Repository files navigation

A baseline for end-to-end RGB render-and-compare pose estimation

Pre-requisites

1: Install dependencies

2. Image dataset

3. 3D dataset

Training and testing

1: Create a config file

2: Training a model

3: Testing a model

Technical details

Training process overview

Frames

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages