Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for dinov2 with registers #1110

Merged
merged 1 commit into from
Dec 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/snippets/6_supported-models.snippet
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
1. **Depth Pro** (from Apple) released with the paper [Depth Pro: Sharp Monocular Metric Depth in Less Than a Second](https://arxiv.org/abs/2410.02073) by Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, Vladlen Koltun.
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2)** (from Meta AI) released with the paper [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193) by Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski.
1. **[DINOv2 with Registers](https://huggingface.co/docs/transformers/model_doc/dinov2_with_registers)** (from Meta AI) released with the paper [DINOv2 with Registers](https://arxiv.org/abs/2309.16588) by Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski.
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT.
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
Expand Down
22 changes: 22 additions & 0 deletions src/models.js
Original file line number Diff line number Diff line change
Expand Up @@ -5389,6 +5389,26 @@ export class Dinov2ForImageClassification extends Dinov2PreTrainedModel {
}
//////////////////////////////////////////////////

//////////////////////////////////////////////////
export class Dinov2WithRegistersPreTrainedModel extends PreTrainedModel { }

/**
* The bare Dinov2WithRegisters Model transformer outputting raw hidden-states without any specific head on top.
*/
export class Dinov2WithRegistersModel extends Dinov2WithRegistersPreTrainedModel { }

/**
* Dinov2WithRegisters Model transformer with an image classification head on top (a linear layer on top of the final hidden state of the [CLS] token) e.g. for ImageNet.
*/
export class Dinov2WithRegistersForImageClassification extends Dinov2WithRegistersPreTrainedModel {
/**
* @param {any} model_inputs
*/
async _call(model_inputs) {
return new SequenceClassifierOutput(await super._call(model_inputs));
}
}
//////////////////////////////////////////////////

//////////////////////////////////////////////////
export class YolosPreTrainedModel extends PreTrainedModel { }
Expand Down Expand Up @@ -7018,6 +7038,7 @@ const MODEL_MAPPING_NAMES_ENCODER_ONLY = new Map([
['convnext', ['ConvNextModel', ConvNextModel]],
['convnextv2', ['ConvNextV2Model', ConvNextV2Model]],
['dinov2', ['Dinov2Model', Dinov2Model]],
['dinov2_with_registers', ['Dinov2WithRegistersModel', Dinov2WithRegistersModel]],
['resnet', ['ResNetModel', ResNetModel]],
['swin', ['SwinModel', SwinModel]],
['swin2sr', ['Swin2SRModel', Swin2SRModel]],
Expand Down Expand Up @@ -7263,6 +7284,7 @@ const MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES = new Map([
['convnext', ['ConvNextForImageClassification', ConvNextForImageClassification]],
['convnextv2', ['ConvNextV2ForImageClassification', ConvNextV2ForImageClassification]],
['dinov2', ['Dinov2ForImageClassification', Dinov2ForImageClassification]],
['dinov2_with_registers', ['Dinov2WithRegistersForImageClassification', Dinov2WithRegistersForImageClassification]],
['resnet', ['ResNetForImageClassification', ResNetForImageClassification]],
['swin', ['SwinForImageClassification', SwinForImageClassification]],
['segformer', ['SegformerForImageClassification', SegformerForImageClassification]],
Expand Down
Loading