-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ORT CI #1875
Fix ORT CI #1875
Conversation
one test is still failing during generation with old model
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ort tests are passing locally, the issue with broadcasting was solved. |
Do you know where this could come from ? |
@@ -356,62 +356,45 @@ def quantize( | |||
) | |||
|
|||
quantizer_factory = QDQQuantizer if use_qdq else ONNXQuantizer | |||
# TODO: maybe this logic can be moved to a method in the configuration class (get_ort_quantizer_kwargs()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the config should not be aware of the ORTQuantizer class right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the quant config already contains everything and can infer which quantizer will use its kwargs (from format and is_static
optimum/optimum/onnxruntime/quantization.py
Line 309 in f300865
use_qdq = quantization_config.is_static and quantization_config.format == QuantFormat.QDQ |
Co-authored-by: Michael Benayoun <[email protected]>
…nto fix-ort-ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Ella Charlaix <[email protected]>
pytest -n auto -m "not run_in_series" --durations=0 -vs onnxruntime | ||
pytest -m "run_in_series" --durations=0 onnxruntime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the order here and started seeing new errors in windows related to input dtype that I didn't see before.
I just noticed but apparently depending on the os, errors propagate into the workflow differently;
in linux based runners (ubuntu), this will run the first command and exit with non-zero code if it fails.
in windows based runners, this will run the first command, and then the second, whether the first succeeds or fails, and will only check the exit code of the last one.
instances:
- this is a windows run that failed in first command, ran the second either ways and returned a success
- this is an ubuntu run (from the same event) that failed in the first and exited directly
this is probably due to difference between bash and powershell
@echarlaix @michaelbenayoun @JingyaHuang @mht-sharma @regisss @fxmarty
- name: Free Disk Space (Ubuntu) | ||
if: matrix.os == 'ubuntu-20.04' | ||
uses: jlumbroso/free-disk-space@main | ||
with: | ||
tool-cache: false | ||
swap-storage: false | ||
large-packages: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked in #1875.
It takes half the time (2-3 min -> 1 min) and shows how much space was freed in each step. Either way we won't need it when we start using intel-cpu runners.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
) | ||
outputs = model.generate(**tokens, num_beams=1, do_sample=False, min_new_tokens=30, max_new_tokens=30) | ||
self.assertTrue(torch.allclose(outputs_onnx, outputs)) | ||
outputs = model.generate(**tokens, num_beams=1, do_sample=False, min_new_tokens=10, max_new_tokens=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why reducing the number of new tokens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was trying to figure out where the failing was coming from and forgot to reset it to 30. Will do that in the windows PR.
What does this PR do?
Fixes ort ci failures due to 1.18
Before submitting
Who can review?