Vision Transformers for Dense Prediction

This repository contains code and models for our paper:

Vision Transformers for Dense Prediction

René Ranftl, Alexey Bochkovskiy, Vladlen Koltun

Changelog

[March 2021] Initial release of inference code and models

Setup

Download the model weights and place them in the weights folder:

Monodepth:

dpt_hybrid-midas-501f0c75.pt
dpt_large-midas-2f21e586.pt

Segmentation:

dpt_hybrid-ade20k-53898607.pt
dpt_large-ade20k-XXXXXXXX.pt

Set up dependencies:
```
conda install pytorch torchvision opencv 
pip install timm
```
The code was tested with Python 3.7, PyTorch 1.8.0, OpenCV 4.5.1, timm 0.4.5

Usage

Place one or more input images in the folder input.
Run a monocular depth estimation model:
```
python run_monodepth.py
```
Or run a semantic segmentation model:
```
python run_segmentation.py
```
The results are written to the folder output_monodepth and output_segmentation, respectively.

You can use the flag -t to switch between different models. Possible options are dpt_hybrid (default) and dpt_large.

Citation

Please cite our paper if you use this code or any of the models:

@article{Ranftl2021,
	author    = {Ren\'{e} Ranftl and Alexey Bochkovskiy and Vladlen Koltun},
	title     = {Vision Transformers for Dense Prediction},
	journal   = {ArXiV Preprint},
	year      = {2021},
}

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Vision Transformers for Dense Prediction

Changelog

Setup

Usage

Citation

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Vision Transformers for Dense Prediction

Changelog

Setup

Usage

Citation

License