Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zero-shot-object-detection w/ OwlViT #392

Merged
merged 16 commits into from
Nov 20, 2023
Merged

Add zero-shot-object-detection w/ OwlViT #392

merged 16 commits into from
Nov 20, 2023

Conversation

xenova
Copy link
Collaborator

@xenova xenova commented Nov 15, 2023

Example usage:

Code adapted from transformers docs.

Example 1

(showing same results as python library with unquantized model)

import { pipeline } from '@xenova/transformers';

let detector = await pipeline('zero-shot-object-detection', 'Xenova/owlvit-base-patch32', { quantized: false });

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/astronaut.png';
let candidate_labels = ['human face', 'rocket', 'nasa badge', 'star-spangled banner'];
let output = await detector(url, candidate_labels);
See output
// [
//   {
//     score: 0.3585130274295807,
//     label: 'human face',
//     box: { xmin: 180, ymin: 71, xmax: 271, ymax: 178 }
//   },
//   {
//     score: 0.28625914454460144,
//     label: 'nasa badge',
//     box: { xmin: 129, ymin: 348, xmax: 206, ymax: 428 }
//   },
//   {
//     score: 0.2107662707567215,
//     label: 'rocket',
//     box: { xmin: 351, ymin: -1, xmax: 468, ymax: 288 }
//   },
//   {
//     score: 0.13869591057300568,
//     label: 'star-spangled banner',
//     box: { xmin: 1, ymin: 0, xmax: 105, ymax: 509 }
//   },
//   {
//     score: 0.1277477741241455,
//     label: 'nasa badge',
//     box: { xmin: 277, ymin: 339, xmax: 326, ymax: 380 }
//   },
//   {
//     score: 0.12643635272979736,
//     label: 'rocket',
//     box: { xmin: 358, ymin: 64, xmax: 424, ymax: 280 }
//   }
// ]

image

Example 2

(different labels + using quantized model)

import { pipeline } from '@xenova/transformers';

let detector = await pipeline('zero-shot-object-detection', 'Xenova/owlvit-base-patch32');

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/astronaut.png';
let candidate_labels = ['human face', 'rocket', 'helmet', 'american flag'];
let output = await detector (url, candidate_labels);
See output
// [
//   {
//     score: 0.24392342567443848,
//     label: 'human face',
//     box: { xmin: 180, ymin: 67, xmax: 274, ymax: 175 }
//   },
//   {
//     score: 0.15129457414150238,
//     label: 'american flag',
//     box: { xmin: 0, ymin: 4, xmax: 106, ymax: 513 }
//   },
//   {
//     score: 0.13649864494800568,
//     label: 'helmet',
//     box: { xmin: 277, ymin: 337, xmax: 511, ymax: 511 }
//   },
//   {
//     score: 0.10262022167444229,
//     label: 'rocket',
//     box: { xmin: 352, ymin: -1, xmax: 463, ymax: 287 }
//   }
// ]

image

Example 3

(different image and pipeline parameters)

import { pipeline } from '@xenova/transformers';

let detector = await pipeline('zero-shot-object-detection', 'Xenova/owlvit-base-patch32');

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/beach.png';
let candidate_labels = ['hat', 'book', 'sunglasses', 'camera'];
let output = await detector(url, candidate_labels, { topk: 4, threshold: 0.05 });
See output
// [
//   {
//     score: 0.1606510728597641,
//     label: 'sunglasses',
//     box: { xmin: 347, ymin: 229, xmax: 429, ymax: 264 }
//   },
//   {
//     score: 0.08935828506946564,
//     label: 'hat',
//     box: { xmin: 38, ymin: 174, xmax: 258, ymax: 364 }
//   },
//   {
//     score: 0.08530698716640472,
//     label: 'camera',
//     box: { xmin: 187, ymin: 350, xmax: 260, ymax: 411 }
//   },
//   {
//     score: 0.08349756896495819,
//     label: 'book',
//     box: { xmin: 261, ymin: 280, xmax: 494, ymax: 425 }
//   }
// ]

image

@xenova xenova merged commit 7cf8a2c into main Nov 20, 2023
4 checks passed
@tobiascornille
Copy link

@xenova Why do the boxes of the object detection pipeline have values between 0-1 and the boxes of this pipeline absolute values? I wanted to try to adapt the static template on HF, but now I need to first get the image height and width somehow.

@xenova
Copy link
Collaborator Author

xenova commented Dec 6, 2023

@tobiascornille You can modify the percentage option to return pixel values (set it to false or return percentage value (set it to true). See the docs for more information.

@xenova xenova deleted the add-owlvit branch December 6, 2023 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants