Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Allow ORTQuantizer over models with subfolder ONNX files #2094

Merged
merged 3 commits into from
Nov 18, 2024

Conversation

tomaarsen
Copy link
Member

@tomaarsen tomaarsen commented Nov 11, 2024

Hello!

Pull Request overview

  • Allow ORTQuantizer over models with subfolder ONNX files

Details

Currently, if you call ORTQuantizer over a model that was loaded with a subfolder, then it'll break:

from optimum.onnxruntime import ORTModelForFeatureExtraction, ORTQuantizer

model = ORTModelForFeatureExtraction.from_pretrained(
    "sentence-transformers-testing/all-MiniLM-L6-v2",
    subfolder="onnx",
    file_name="model.onnx",
)
quantizer = ORTQuantizer.from_pretrained(model)
print(quantizer)
Traceback (most recent call last):
  File "...\optimum\demo_ort_quantizer.py", line 8, in <module>
    quantizer = ORTQuantizer.from_pretrained(model)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\optimum\onnxruntime\quantization.py", line 156, in from_pretrained
    return cls(path)
           ^^^^^^^^^
  File "...\optimum\onnxruntime\quantization.py", line 102, in __init__
    self.config = AutoConfig.from_pretrained(self.onnx_model_path.parent)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\site-packages\transformers\models\auto\configuration_auto.py", line 1049, in from_pretrained
    raise ValueError(
ValueError: Unrecognized model in ...\.cache\huggingface\hub\models--sentence-transformers--all-MiniLM-L6-v2\snapshots\fa97f6e7cb1a59073dff9e6b13e2715cf7475ac9\onnx. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, git, glm, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, graphormer, grounding-dino, groupvit, hiera, hubert, ibert, idefics, idefics2, idefics3, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mimi, mistral, mixtral, mllama, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, owlv2, owlvit, paligemma, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, persimmon, phi, phi3, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rwkv, sam, seamless_m4t, seamless_m4t_v2, segformer, seggpt, sew, sew-d, siglip, siglip_vision_model, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, time_series_transformer, timesformer, timm_backbone, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vits, vivit, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso, zamba, zoedepth, onnx_model

There are 2 issues at play:

  1. The config (although known in the ORT Model) is not passed nicely to the ORTQuantizer with from_pretrained
  2. If AutoConfig.from_pretrained fails, it often fails with a ValueError rather than an OSError..

An underlying issue is that #2044 added/strengthened the "ONNX in subfolders" support by allowing the config to be in root while the model is in a subfolder - the ORTQuantizer wasn't updated to reflect that the config.json isn't necessarily adjacent to the model.onnx.

FYI, this is breaking https://sbert.net/docs/package_reference/util.html#sentence_transformers.backend.export_dynamic_quantized_onnx_model in some cases.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@echarlaix as you added/strengthened the "ONNX in subfolders" support & that's how I encountered this.

  • Tom Aarsen

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix @tomaarsen !

@echarlaix echarlaix merged commit 400bb82 into huggingface:main Nov 18, 2024
22 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants