This repository contains a Torch implementation for both the DeepMask and SharpMask object proposal algorithms.
DeepMask is trained with two objectives: given an image patch, one branch of the model outputs a class-agnostic segmentation mask, while the other branch outputs how likely the patch is to contain an object. At test time, DeepMask is applied densely to an image and generates a set of object masks, each with a corresponding objectness score. These masks densely cover the objects in an image and can be used as a first step for object detection and other tasks in computer vision.
SharpMask is an extension of DeepMask which generates higher-fidelity masks using an additional top-down refinement step. The idea is to first generate a coarse mask encoding in a feedforward pass, then refine this mask encoding in a top-down pass using features at successively lower layers. This result in masks that better adhere to object boundaries.
If you use DeepMask/SharpMask in your research, please cite the relevant papers:
@inproceedings{DeepMask,
title = {Learning to Segment Object Candidates},
author = {Pedro O. Pinheiro and Ronan Collobert and Piotr Dollár},
booktitle = {NIPS},
year = {2015}
}
@inproceedings{SharpMask,
title = {Learning to Refine Object Segments},
author = {Pedro O. Pinheiro and Tsung-Yi Lin and Ronan Collobert and Piotr Dollár},
booktitle = {ECCV},
year = {2016}
}
Note: the version of DeepMask implemented here is the updated version reported in the SharpMask paper. DeepMask takes on average .5s per COCO image, SharpMask runs at .8s. Runtime roughly doubles for the "zoom" versions of the models.
- MAC OS X or Linux
- NVIDIA GPU with compute capability 3.5+
- Torch with packages: COCO API, image, tds, cjson, nnx, optim, cutorch, cunn, cudnn
To run pretrained DeepMask/SharpMask models to generate object proposals, follow these steps:
-
Clone this repository into $DEEPMASK:
DEEPMASK=/desired/absolute/path/to/deepmask/ # set absolute path as desired git clone [email protected]:facebookresearch/deepmask.git $DEEPMASK
-
Download pre-trained DeepMask and SharpMask models:
mkdir -p $DEEPMASK/pretrained/deepmask; cd $DEEPMASK/pretrained/deepmask wget https://s3.amazonaws.com/deepmask/models/deepmask/model.t7 mkdir -p $DEEPMASK/pretrained/sharpmask; cd $DEEPMASK/pretrained/sharpmask wget https://s3.amazonaws.com/deepmask/models/sharpmask/model.t7
-
Run
computeProposals.lua
with a given model and optional target image (specified via the-img
option):# apply to a default sample image (data/testImage.jpg) cd $DEEPMASK th computeProposals.lua $DEEPMASK/pretrained/deepmask # run DeepMask th computeProposals.lua $DEEPMASK/pretrained/sharpmask # run SharpMask th computeProposals.lua $DEEPMASK/sharpmask -img /path/to/image.jpg
To train your own DeepMask/SharpMask models, follow these steps:
-
If you have not done so already, clone this repository into $DEEPMASK:
DEEPMASK=/desired/absolute/path/to/deepmask/ # set absolute path as desired git clone [email protected]:facebookresearch/deepmask.git $DEEPMASK
-
Download the Torch ResNet-50 model pretrained on ImageNet:
mkdir -p $DEEPMASK/pretrained; cd $DEEPMASK/pretrained wget https://s3.amazonaws.com/deepmask/models/resnet-50.t7
-
Download and extract the COCO images and annotations:
mkdir -p $DEEPMASK/data; cd $DEEPMASK/data wget http://msvocds.blob.core.windows.net/annotations-1-0-3/instances_train-val2014.zip wget http://msvocds.blob.core.windows.net/coco2014/train2014.zip wget http://msvocds.blob.core.windows.net/coco2014/val2014.zip
To train, launch the train.lua
script. It contains several options, to list them, simply use the --help
flag.
-
To train DeepMask:
th train.lua
-
To train SharpMask (requires pre-trained DeepMask model):
th train.lua -dm /path/to/trained/deepmask/
There are two ways to evaluate a model on the COCO dataset.
-
evalPerPatch.lua
evaluates only the mask generation step. The per-patch evaluation only uses image patches that contain roughly centered objects. Its usage is as follows:th evalPerPatch.lua /path/to/trained/deepmask-or-sharpmask/
-
evalPerImage.lua
evaluates the full model on COCO images, as reported in the papers. By default, it evaluates performance on the first 5K COCO validation images (runth evalPerImage.lua --help
to see the options):th evalPerImage.lua /path/to/trained/deepmask-or-sharpmask/
You can download pre-computed proposals (1000 per image) on the COCO and PASCAL VOC datasets, for both segmentation and bounding box proposals. We use the COCO JSON format for the proposals. The proposals are divided into chunks of 500 images each (that is, each JSON contains 1000 proposals per image for 500 images). All proposals correspond to the "zoom" setting in the paper (DeepMaskZoom and SharpMaskZoom) which tend to be most effective for object detection.
- COCO Boxes: [train | val | test-dev | test-full]
- COCO Segments: [train | val | test-dev | test-full]
- PASCAL Boxes: [train+val+test-2007 | train+val+test-2012]
- PASCAL Segments: [train+val+test-2007 | train+val+test-2012]
- COCO Boxes: [train | val | test-dev | test-full]
- COCO Segments: [train | val | test-dev | test-full]
- PASCAL Boxes: [train+val+test-2007 | train+val+test-2012]
- PASCAL Segments: [train+val+test-2007 | train+val+test-2012]