-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ONNX export for DinoV2 models #1580
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Although validation passes for all the facebook models on the Hub, I'm getting a few issues when actually running the models. It appears to stem from this line, since bicubic interpolation isn't supported for this operation in onnxruntime. Changing it to "bilinear" seems to work (and output doesn't differ too much). |
I've updated the converter to use dummy values from the image preprocessor, so that the branch which interpolates the positional embeddings is triggered. As mentioned above, however, this doesn't work with the current state of that function (fails with @fxmarty I suppose we can fix this with a model patcher? |
@xenova Is bicubic/linear not a config option? If it is not, I am afraid we need to patch indeed. What is the issue exactly? The model not loadable in ORT? |
Unfortunately not - this seems to be hard-coded in the model code: to interpolate the positional embeddings to the correct shape.
It seems to be that the required bubic interpolation is just not supported.
Here's the full error log:
(error copied from depth_anything export, but it uses dinov2 as a backend) For all the depth_anything and dinov2 models I've exported and released on the HF hub, I had to manually override the code in the following ways:
Link to models I've converted with these fixes: |
I followed the same path independently, and can confirm that bilinear instead of bicubic interpolation for the position encodings results in unnoticeable visual differences in the generated depth map. |
Closed in favor of #2001 |
What does this PR do?
As title says :)
Fixes # (issue)
Before submitting