[`fix`] Allow ORTQuantizer over models with subfolder ONNX files #2094

tomaarsen · 2024-11-11T16:46:27Z

Hello!

Pull Request overview

Allow ORTQuantizer over models with subfolder ONNX files

Details

Currently, if you call ORTQuantizer over a model that was loaded with a subfolder, then it'll break:

from optimum.onnxruntime import ORTModelForFeatureExtraction, ORTQuantizer

model = ORTModelForFeatureExtraction.from_pretrained(
    "sentence-transformers-testing/all-MiniLM-L6-v2",
    subfolder="onnx",
    file_name="model.onnx",
)
quantizer = ORTQuantizer.from_pretrained(model)
print(quantizer)

Traceback (most recent call last):
  File "...\optimum\demo_ort_quantizer.py", line 8, in <module>
    quantizer = ORTQuantizer.from_pretrained(model)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\optimum\onnxruntime\quantization.py", line 156, in from_pretrained
    return cls(path)
           ^^^^^^^^^
  File "...\optimum\onnxruntime\quantization.py", line 102, in __init__
    self.config = AutoConfig.from_pretrained(self.onnx_model_path.parent)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\site-packages\transformers\models\auto\configuration_auto.py", line 1049, in from_pretrained
    raise ValueError(
ValueError: Unrecognized model in ...\.cache\huggingface\hub\models--sentence-transformers--all-MiniLM-L6-v2\snapshots\fa97f6e7cb1a59073dff9e6b13e2715cf7475ac9\onnx. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, git, glm, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, graphormer, grounding-dino, groupvit, hiera, hubert, ibert, idefics, idefics2, idefics3, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mimi, mistral, mixtral, mllama, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, owlv2, owlvit, paligemma, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, persimmon, phi, phi3, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rwkv, sam, seamless_m4t, seamless_m4t_v2, segformer, seggpt, sew, sew-d, siglip, siglip_vision_model, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, time_series_transformer, timesformer, timm_backbone, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vits, vivit, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso, zamba, zoedepth, onnx_model

There are 2 issues at play:

The config (although known in the ORT Model) is not passed nicely to the ORTQuantizer with from_pretrained
If AutoConfig.from_pretrained fails, it often fails with a ValueError rather than an OSError..

An underlying issue is that #2044 added/strengthened the "ONNX in subfolders" support by allowing the config to be in root while the model is in a subfolder - the ORTQuantizer wasn't updated to reflect that the config.json isn't necessarily adjacent to the model.onnx.

FYI, this is breaking https://sbert.net/docs/package_reference/util.html#sentence_transformers.backend.export_dynamic_quantized_onnx_model in some cases.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@echarlaix as you added/strengthened the "ONNX in subfolders" support & that's how I encountered this.

Tom Aarsen

…m_pretrained("does/not/exist")

echarlaix

LGTM, thanks for the fix @tomaarsen !

tomaarsen added 3 commits November 11, 2024 17:30

Allow ORTQuantizer over models with subfolder ONNX files

f167303

Also catch ValueError as that seems a common fail when AutoConfig.fro…

66df86a

…m_pretrained("does/not/exist")

Use test case that previously failed

a5d0035

echarlaix approved these changes Nov 18, 2024

View reviewed changes

echarlaix merged commit 400bb82 into huggingface:main Nov 18, 2024
22 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`fix`] Allow ORTQuantizer over models with subfolder ONNX files #2094

[`fix`] Allow ORTQuantizer over models with subfolder ONNX files #2094

tomaarsen commented Nov 11, 2024 •

edited

Loading

echarlaix left a comment

[fix] Allow ORTQuantizer over models with subfolder ONNX files #2094

[fix] Allow ORTQuantizer over models with subfolder ONNX files #2094

Conversation

tomaarsen commented Nov 11, 2024 • edited Loading

Pull Request overview

Details

Before submitting

Who can review?

echarlaix left a comment

Choose a reason for hiding this comment

[`fix`] Allow ORTQuantizer over models with subfolder ONNX files #2094

[`fix`] Allow ORTQuantizer over models with subfolder ONNX files #2094

tomaarsen commented Nov 11, 2024 •

edited

Loading