Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samhq model addition #35147

Draft
wants to merge 27 commits into
base: main
Choose a base branch
from
Draft

Conversation

sushmanthreddy
Copy link
Contributor

@sushmanthreddy sushmanthreddy commented Dec 8, 2024

close #31137

Pull Request Title: Add HQ-SAM Functionality to Transformers Library

Model Overview

HQ-SAM (Segment Anything in High Quality) is an enhanced version of the Segment Anything Model (SAM), addressing limitations in mask quality for intricate structures and challenging segmentation tasks. The model refines SAM’s predictions using a High-Quality Output Token and Global-Local Feature Fusion while preserving SAM’s efficiency and zero-shot generalization capabilities.

According to the original implementation, HQ-SAM significantly improves mask boundaries and reduces segmentation errors by introducing minimal additional parameters (<0.5%) and computational overhead. The model is designed to maintain compatibility with SAM’s existing prompt-based design and mask decoder architecture.

Repository and Weights

The HQ-SAM implementation and pre-trained weights are available in the following repository:
https://github.com/SysCV/sam-hq

HQ-SAM provides three pre-trained weight variants:

  • sam_hq_vit_b – Small vision encoder.
  • sam_hq_vit_l – Medium vision encoder.
  • sam_hq_vit_h – Large vision encoder.

The main difference between these variants is the size of the Vision Transformer (ViT) encoder, while the prompt encoder and mask decoder remain unchanged.

Functionality

For each input (e.g., bounding boxes, 2D points, or coarse masks), HQ-SAM predicts high-quality binary masks that enhance segmentation precision. Improvements include:

  • More accurate boundaries.
  • Correction of coarse masks and segmentation errors.
  • Enhanced detail preservation for thin structures and complex object geometries.

Reviewers: @molbap

@sushmanthreddy sushmanthreddy marked this pull request as draft December 8, 2024 09:15
@sushmanthreddy sushmanthreddy marked this pull request as ready for review December 20, 2024 00:48
@sushmanthreddy sushmanthreddy marked this pull request as draft December 22, 2024 09:36
@sushmanthreddy
Copy link
Contributor Author

still work in progress ,work is being converted from model files to modular file due to there is lot of code of code we can reuse from the sam model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SAM-HQ implementation in transformers
2 participants