Add Cross Attention Masking for Grounding DINO #30176

EduardoPach · 2024-04-11T09:10:33Z

Feature request

In the recent GroundingDino PR (#26087) one thing that was missing was the cross-attention masking when running Multiheaded attention between text and image features. The assumption was that for inference the text used would probably be a fixed set of labels, thus not needing the cross-attention masking. However, for a more robust and training-friendly implementation, we should support text cross-attention mask.

Motivation

This was an actual request in the PR of the model from people who are using the transformers implementation on their research see #26087 (comment)

Your contribution

A PR should be open soon 😄

The text was updated successfully, but these errors were encountered:

EduardoPach · 2024-04-11T09:10:58Z

c.c. @amyeroberts

amyeroberts added the Vision label Apr 11, 2024

EduardoPach mentioned this issue Apr 20, 2024

[Grounding DINO] Add support for cross-attention in GroundingDinoMultiHeadAttention #30364

Merged

amyeroberts closed this as completed in #30364 Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Cross Attention Masking for Grounding DINO #30176

Add Cross Attention Masking for Grounding DINO #30176

EduardoPach commented Apr 11, 2024

EduardoPach commented Apr 11, 2024

Add Cross Attention Masking for Grounding DINO #30176

Add Cross Attention Masking for Grounding DINO #30176

Comments

EduardoPach commented Apr 11, 2024

Feature request

Motivation

Your contribution

EduardoPach commented Apr 11, 2024