-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ORT CI #1875
Fix ORT CI #1875
Changes from all commits
aa4f4d5
922b85c
6282a06
8150a3d
6e86081
d744499
3885f8d
59b8811
b01dbef
e4d259e
7d8ecae
62c1f47
b458e28
2315a5f
31a501e
7322cdd
9715a6f
45589e1
7a41a42
4a6bda2
92fc653
862f3a6
bc1a586
4ee746c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,9 +4,9 @@ name: ONNX Runtime / Python - Test | |
|
||
on: | ||
push: | ||
branches: [ main ] | ||
branches: [main] | ||
pull_request: | ||
branches: [ main ] | ||
branches: [main] | ||
|
||
concurrency: | ||
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }} | ||
|
@@ -22,62 +22,34 @@ jobs: | |
|
||
runs-on: ${{ matrix.os }} | ||
steps: | ||
- uses: actions/checkout@v2 | ||
|
||
- name: Free disk space | ||
if: matrix.os == 'ubuntu-20.04' | ||
run: | | ||
df -h | ||
sudo apt-get update | ||
sudo apt-get purge -y '^apache.*' | ||
sudo apt-get purge -y '^imagemagick.*' | ||
sudo apt-get purge -y '^dotnet.*' | ||
sudo apt-get purge -y '^aspnetcore.*' | ||
sudo apt-get purge -y 'php.*' | ||
sudo apt-get purge -y '^temurin.*' | ||
sudo apt-get purge -y '^mysql.*' | ||
sudo apt-get purge -y '^java.*' | ||
sudo apt-get purge -y '^openjdk.*' | ||
sudo apt-get purge -y microsoft-edge-stable google-cloud-cli azure-cli google-chrome-stable firefox powershell mono-devel | ||
df -h | ||
sudo apt-get autoremove -y >/dev/null 2>&1 | ||
sudo apt-get clean | ||
df -h | ||
echo "https://github.com/actions/virtual-environments/issues/709" | ||
sudo rm -rf "$AGENT_TOOLSDIRECTORY" | ||
df -h | ||
echo "remove big /usr/local" | ||
sudo rm -rf "/usr/local/share/boost" | ||
sudo rm -rf /usr/local/lib/android >/dev/null 2>&1 | ||
df -h | ||
echo "remove /usr/share leftovers" | ||
sudo rm -rf /usr/share/dotnet/sdk > /dev/null 2>&1 | ||
sudo rm -rf /usr/share/dotnet/shared > /dev/null 2>&1 | ||
sudo rm -rf /usr/share/swift > /dev/null 2>&1 | ||
df -h | ||
echo "remove other leftovers" | ||
sudo rm -rf /var/lib/mysql > /dev/null 2>&1 | ||
sudo rm -rf /home/runner/.dotnet > /dev/null 2>&1 | ||
sudo rm -rf /home/runneradmin/.dotnet > /dev/null 2>&1 | ||
sudo rm -rf /etc/skel/.dotnet > /dev/null 2>&1 | ||
sudo rm -rf /usr/local/.ghcup > /dev/null 2>&1 | ||
sudo rm -rf /usr/local/aws-cli > /dev/null 2>&1 | ||
sudo rm -rf /usr/local/lib/node_modules > /dev/null 2>&1 | ||
sudo rm -rf /usr/lib/heroku > /dev/null 2>&1 | ||
sudo rm -rf /usr/local/share/chromium > /dev/null 2>&1 | ||
df -h | ||
- name: Setup Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v2 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
|
||
- name: Install dependencies | ||
run: | | ||
pip install .[tests,onnxruntime] | ||
- name: Test with pytest | ||
working-directory: tests | ||
run: | | ||
pytest -n auto -m "not run_in_series" --durations=0 -vs onnxruntime | ||
pytest -m "run_in_series" --durations=0 onnxruntime | ||
- name: Free Disk Space (Ubuntu) | ||
if: matrix.os == 'ubuntu-20.04' | ||
uses: jlumbroso/free-disk-space@main | ||
with: | ||
tool-cache: false | ||
swap-storage: false | ||
large-packages: false | ||
Comment on lines
+25
to
+31
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why this change? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I asked in #1875. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nice |
||
|
||
- name: Checkout code | ||
uses: actions/checkout@v4 | ||
|
||
- name: Setup Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
|
||
- name: Install dependencies | ||
run: | | ||
pip install --upgrade pip | ||
pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu | ||
pip install .[tests,onnxruntime] | ||
- name: Test with pytest (in series) | ||
working-directory: tests | ||
run: | | ||
pytest onnxruntime -m "run_in_series" --durations=0 -vvvv -s | ||
- name: Test with pytest (in parallel) | ||
working-directory: tests | ||
run: | | ||
pytest onnxruntime -m "not run_in_series" --durations=0 -vvvv -s -n auto |
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -356,62 +356,45 @@ def quantize( | |||
) | ||||
|
||||
quantizer_factory = QDQQuantizer if use_qdq else ONNXQuantizer | ||||
# TODO: maybe this logic can be moved to a method in the configuration class (get_ort_quantizer_kwargs()) | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But the config should not be aware of the ORTQuantizer class right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, the quant config already contains everything and can infer which quantizer will use its kwargs (from format and is_static optimum/optimum/onnxruntime/quantization.py Line 309 in f300865
|
||||
# that returns the dictionary of arguments to pass to the quantizer factory depending on the ort version | ||||
quantizer_kwargs = { | ||||
"model": onnx_model, | ||||
"static": quantization_config.is_static, | ||||
"per_channel": quantization_config.per_channel, | ||||
"mode": quantization_config.mode, | ||||
"weight_qType": quantization_config.weights_dtype, | ||||
"input_qType": quantization_config.activations_dtype, | ||||
"tensors_range": calibration_tensors_range, | ||||
"reduce_range": quantization_config.reduce_range, | ||||
"nodes_to_quantize": quantization_config.nodes_to_quantize, | ||||
"nodes_to_exclude": quantization_config.nodes_to_exclude, | ||||
"op_types_to_quantize": [ | ||||
operator.value if isinstance(operator, ORTQuantizableOperator) else operator | ||||
for operator in quantization_config.operators_to_quantize | ||||
], | ||||
"extra_options": { | ||||
"WeightSymmetric": quantization_config.weights_symmetric, | ||||
"ActivationSymmetric": quantization_config.activations_symmetric, | ||||
"EnableSubgraph": has_subgraphs, | ||||
"ForceSymmetric": quantization_config.activations_symmetric and quantization_config.weights_symmetric, | ||||
"AddQDQPairToWeight": quantization_config.qdq_add_pair_to_weight, | ||||
"DedicatedQDQPair": quantization_config.qdq_dedicated_pair, | ||||
"QDQOpTypePerChannelSupportToAxis": quantization_config.qdq_op_type_per_channel_support_to_axis, | ||||
}, | ||||
} | ||||
|
||||
if use_qdq: | ||||
quantizer_kwargs.pop("mode") | ||||
if parse(ort_version) >= Version("1.18.0"): | ||||
# The argument `static` has been removed from the qdq quantizer factory in ORT 1.18 | ||||
quantizer_kwargs.pop("static") | ||||
|
||||
if parse(ort_version) >= Version("1.13.0"): | ||||
# The argument `input_qType` has been changed into `activation_qType` from ORT 1.13 | ||||
quantizer = quantizer_factory( | ||||
model=onnx_model, | ||||
static=quantization_config.is_static, | ||||
per_channel=quantization_config.per_channel, | ||||
mode=quantization_config.mode, | ||||
weight_qType=quantization_config.weights_dtype, | ||||
activation_qType=quantization_config.activations_dtype, | ||||
tensors_range=calibration_tensors_range, | ||||
reduce_range=quantization_config.reduce_range, | ||||
nodes_to_quantize=quantization_config.nodes_to_quantize, | ||||
nodes_to_exclude=quantization_config.nodes_to_exclude, | ||||
op_types_to_quantize=[ | ||||
operator.value if isinstance(operator, ORTQuantizableOperator) else operator | ||||
for operator in quantization_config.operators_to_quantize | ||||
], | ||||
extra_options={ | ||||
"WeightSymmetric": quantization_config.weights_symmetric, | ||||
"ActivationSymmetric": quantization_config.activations_symmetric, | ||||
"EnableSubgraph": has_subgraphs, | ||||
"ForceSymmetric": quantization_config.activations_symmetric | ||||
and quantization_config.weights_symmetric, | ||||
"AddQDQPairToWeight": quantization_config.qdq_add_pair_to_weight, | ||||
"DedicatedQDQPair": quantization_config.qdq_dedicated_pair, | ||||
"QDQOpTypePerChannelSupportToAxis": quantization_config.qdq_op_type_per_channel_support_to_axis, | ||||
}, | ||||
) | ||||
else: | ||||
quantizer = quantizer_factory( | ||||
model=onnx_model, | ||||
static=quantization_config.is_static, | ||||
per_channel=quantization_config.per_channel, | ||||
mode=quantization_config.mode, | ||||
weight_qType=quantization_config.weights_dtype, | ||||
input_qType=quantization_config.activations_dtype, | ||||
tensors_range=calibration_tensors_range, | ||||
reduce_range=quantization_config.reduce_range, | ||||
nodes_to_quantize=quantization_config.nodes_to_quantize, | ||||
nodes_to_exclude=quantization_config.nodes_to_exclude, | ||||
op_types_to_quantize=[ | ||||
operator.value if isinstance(operator, ORTQuantizableOperator) else operator | ||||
for operator in quantization_config.operators_to_quantize | ||||
], | ||||
extra_options={ | ||||
"WeightSymmetric": quantization_config.weights_symmetric, | ||||
"ActivationSymmetric": quantization_config.activations_symmetric, | ||||
"EnableSubgraph": False, | ||||
"ForceSymmetric": quantization_config.activations_symmetric | ||||
and quantization_config.weights_symmetric, | ||||
"AddQDQPairToWeight": quantization_config.qdq_add_pair_to_weight, | ||||
"DedicatedQDQPair": quantization_config.qdq_dedicated_pair, | ||||
"QDQOpTypePerChannelSupportToAxis": quantization_config.qdq_op_type_per_channel_support_to_axis, | ||||
}, | ||||
) | ||||
# The argument `input_qType` has been changed into `activation_qType` in ORT 1.13 | ||||
quantizer_kwargs["activation_qType"] = quantizer_kwargs.pop("input_qType") | ||||
|
||||
quantizer = quantizer_factory(**quantizer_kwargs) | ||||
|
||||
LOGGER.info("Quantizing model...") | ||||
quantizer.quantize_model() | ||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2274,21 +2274,25 @@ class ORTModelForCausalLMIntegrationTest(ORTModelTestMixin): | |
SPEEDUP_CACHE = 1.1 | ||
|
||
@parameterized.expand([(False,), (True,)]) | ||
@pytest.mark.run_in_series | ||
def test_inference_old_onnx_model(self, use_cache): | ||
model_id = "optimum/gpt2" | ||
tokenizer = get_preprocessor("gpt2") | ||
IlyasMoutawwakil marked this conversation as resolved.
Show resolved
Hide resolved
|
||
model = AutoModelForCausalLM.from_pretrained("gpt2") | ||
tokenizer = get_preprocessor(model_id) | ||
text = "This is a sample output" | ||
tokens = tokenizer(text, return_tensors="pt") | ||
onnx_model = ORTModelForCausalLM.from_pretrained(model_id, use_cache=use_cache, use_io_binding=use_cache) | ||
onnx_model = ORTModelForCausalLM.from_pretrained("optimum/gpt2", use_cache=use_cache, use_io_binding=use_cache) | ||
|
||
self.assertEqual(onnx_model.use_cache, use_cache) | ||
self.assertEqual(onnx_model.model_path.name, ONNX_DECODER_WITH_PAST_NAME if use_cache else ONNX_DECODER_NAME) | ||
outputs_onnx = onnx_model.generate( | ||
**tokens, num_beams=1, do_sample=False, min_new_tokens=30, max_new_tokens=30 | ||
|
||
text = "The capital of France is" | ||
tokens = tokenizer(text, return_tensors="pt") | ||
|
||
onnx_outputs = onnx_model.generate( | ||
**tokens, num_beams=1, do_sample=False, min_new_tokens=10, max_new_tokens=10 | ||
) | ||
outputs = model.generate(**tokens, num_beams=1, do_sample=False, min_new_tokens=30, max_new_tokens=30) | ||
self.assertTrue(torch.allclose(outputs_onnx, outputs)) | ||
outputs = model.generate(**tokens, num_beams=1, do_sample=False, min_new_tokens=10, max_new_tokens=10) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why reducing the number of new tokens? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. was trying to figure out where the failing was coming from and forgot to reset it to 30. Will do that in the windows PR. |
||
onnx_text_outputs = tokenizer.decode(onnx_outputs[0], skip_special_tokens=True) | ||
text_outputs = tokenizer.decode(outputs[0], skip_special_tokens=True) | ||
self.assertEqual(onnx_text_outputs, text_outputs) | ||
|
||
def test_load_model_from_hub_onnx(self): | ||
model = ORTModelForCausalLM.from_pretrained("fxmarty/onnx-tiny-random-gpt2-without-merge") | ||
|
@@ -3596,6 +3600,7 @@ def _get_onnx_model_dir(self, model_id, model_arch, test_name): | |
|
||
return onnx_model_dir | ||
|
||
@pytest.mark.run_in_series | ||
def test_inference_old_onnx_model(self): | ||
model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small") | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the order here and started seeing new errors in windows related to input dtype that I didn't see before.
I just noticed but apparently depending on the os, errors propagate into the workflow differently;
in linux based runners (ubuntu), this will run the first command and exit with non-zero code if it fails.
in windows based runners, this will run the first command, and then the second, whether the first succeeds or fails, and will only check the exit code of the last one.
instances:
this is probably due to difference between bash and powershell
@echarlaix @michaelbenayoun @JingyaHuang @mht-sharma @regisss @fxmarty