[ORT Training] Some important updates of ONNX Runtime training APIs #1335

JingyaHuang · 2023-09-01T16:02:14Z

What does this PR do?

Update ORTTrainer to be compatible with transformers 40ea9ab2a1ad99f12e71c3a26215ad33df082ef9.
/Deprecation/ Deprecate the evaluation and prediction using ONNX runtime
Refactoring of examples, tests and enable again the CI -> Run with tiny functional models

[Deprecation notes]

The Optimum team decided to deprecate the evaluation and prediction using ONNX Runtime for the reasons below:

After the deprecation, evaluation and prediction for the trained model are always possible within PyTorch through ORTTrainer. If you want to do inference with ORT, either the evaluation or the prediction can be done through ORTModels in the library.
Reduce the workload of maintaining ORT inference, which could be broken easily with the evolution of ORT inference APIs. And the feature has no clear usage to encourage us to continue maintaining it.
Ease the maintenance of ORTTrainer and training examples

[Other subjects to discuss]

Automate the update and tests of ORT training examples

optimum/onnxruntime/trainer_seq2seq.py

fxmarty · 2023-09-16T08:33:17Z

#1327 should already be fixed

Co-authored-by: fxmarty <[email protected]>

JingyaHuang · 2023-09-18T09:35:26Z

yeah @fxmarty I forgot to update the description, thanks a lot for the fix.

fxmarty

LGTM, hopefully adding position_ids inputs did not break ORTTrainer?

regisss

LGTM!

JingyaHuang · 2023-10-18T14:15:39Z

All tests passed:

====================================================== test session starts =======================================================
platform linux -- Python 3.10.0, pytest-7.4.2, pluggy-1.0.0 -- /home/onnxruntimedev/miniconda3/bin/python
cachedir: .pytest_cache
rootdir: /workspace
configfile: pyproject.toml
collected 13 items                                                                                                               

onnxruntime/nightly_test_trainer.py::ORTTrainerIntegrationTest::test_trainer_fp16_0_distilbert_text_classification PASSED  [  7%]
onnxruntime/nightly_test_trainer.py::ORTTrainerIntegrationTest::test_trainer_fp16_1_gpt2_text_generation PASSED            [ 15%]
onnxruntime/nightly_test_trainer.py::ORTTrainerIntegrationTest::test_trainer_fp16_2_t5_text2text_generation PASSED         [ 23%]
onnxruntime/nightly_test_trainer.py::ORTTrainerIntegrationTest::test_trainer_fp32_0_distilbert_text_classification PASSED  [ 30%]
onnxruntime/nightly_test_trainer.py::ORTTrainerIntegrationTest::test_trainer_fp32_1_gpt2_text_generation PASSED            [ 38%]
onnxruntime/nightly_test_trainer.py::ORTTrainerIntegrationTest::test_trainer_fp32_2_t5_text2text_generation PASSED         [ 46%]
onnxruntime/nightly_test_trainer.py::ORTTrainerIntegrationTest::test_trainer_fp32_with_label_smoothing_0_distilbert_text_classification PASSED [ 53%]
onnxruntime/nightly_test_trainer.py::ORTTrainerIntegrationTest::test_trainer_fp32_with_label_smoothing_1_gpt2_text_generation PASSED [ 61%]
onnxruntime/nightly_test_trainer.py::ORTTrainerIntegrationTest::test_trainer_fp32_with_label_smoothing_2_t5_text2text_generation PASSED [ 69%]
onnxruntime/nightly_test_trainer.py::ORTTrainerIntegrationDeepSpeedTest::test_trainer_fp16_ds_stage1_0_distilbert_text_classification PASSED [ 76%]
onnxruntime/nightly_test_trainer.py::ORTTrainerIntegrationDeepSpeedTest::test_trainer_fp16_ds_stage2_0_distilbert_text_classification PASSED [ 84%]
onnxruntime/nightly_test_trainer.py::ORTTrainerOptimizerChoiceTest::test_ort_fused_adam PASSED                             [ 92%]
onnxruntime/nightly_test_trainer.py::ORTTrainerExampleTest::test_trainer_glue SKIPPED (skip for now, server socket error)  [100%]

Left an example test skipped, need to add similar tests for other tasks.

JingyaHuang added 5 commits September 1, 2023 14:57

update trainer

21cbdf4

update args

a334083

update to main

2e526c4

Merge branch 'main' into update-ort-trainer-to-4.32

b2357a9

update to 4.33

ea3c68f

JingyaHuang changed the title ~~Update ort trainer to 4.32.1~~ Update ORTTrainer to 4.33.1 Sep 15, 2023

JingyaHuang and others added 3 commits September 15, 2023 10:29

fix style

df9d8af

make style

15afde4

fix when testing

fec39e3

JingyaHuang marked this pull request as ready for review September 15, 2023 15:43

JingyaHuang requested review from fxmarty and regisss September 15, 2023 15:43

JingyaHuang mentioned this pull request Sep 15, 2023

add llama example #1382

Merged

fxmarty reviewed Sep 16, 2023

View reviewed changes

optimum/onnxruntime/trainer_seq2seq.py Outdated Show resolved Hide resolved

Update optimum/onnxruntime/trainer_seq2seq.py

da18b6e

Co-authored-by: fxmarty <[email protected]>

JingyaHuang requested a review from fxmarty September 18, 2023 09:35

fxmarty approved these changes Sep 18, 2023

View reviewed changes

regisss approved these changes Sep 18, 2023

View reviewed changes

JingyaHuang changed the title ~~Update ORTTrainer to 4.33.1~~ Some important updates of ONNx Runtime training APIs Oct 12, 2023

JingyaHuang changed the title ~~Some important updates of ONNx Runtime training APIs~~ [ORT Training] Some important updates of ONNx Runtime training APIs Oct 12, 2023

JingyaHuang changed the title ~~[ORT Training] Some important updates of ONNx Runtime training APIs~~ [ORT Training] Some important updates of ONNX Runtime training APIs Oct 12, 2023

JingyaHuang added 4 commits October 12, 2023 14:11

Merge branch 'main' into update-ort-trainer-to-4.32

a6eef26

deprecate ort inf

762292e

deprectae ort inf for seq2seq

5a8624e

update trainer and its args to main

a6c53e0

JingyaHuang added training and removed training labels Oct 14, 2023

JingyaHuang added the gpu-test trigger GPU tests label Oct 16, 2023

try CI permission

363f8db

JingyaHuang removed gpu-test trigger GPU tests training labels Oct 17, 2023

Merge branch 'main' into update-ort-trainer-to-4.32

a19269e

JingyaHuang added 3 commits October 18, 2023 14:57

update tests

4a51534

update examples

031ad63

withdraw CI change

075d132

JingyaHuang merged commit 85e6fff into huggingface:main Oct 18, 2023
49 of 52 checks passed

JingyaHuang deleted the update-ort-trainer-to-4.32 branch October 18, 2023 21:51

JingyaHuang mentioned this pull request Oct 18, 2023

ORT training support stage3 #1439

Closed

3 tasks

AdamLouly mentioned this pull request Oct 23, 2023

Bloom failing after setting up max_position_embeddings on Trainer.py #1477

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ORT Training] Some important updates of ONNX Runtime training APIs #1335

[ORT Training] Some important updates of ONNX Runtime training APIs #1335

JingyaHuang commented Sep 1, 2023 •

edited

Loading

fxmarty commented Sep 16, 2023

JingyaHuang commented Sep 18, 2023

fxmarty left a comment

regisss left a comment

JingyaHuang commented Oct 18, 2023

[ORT Training] Some important updates of ONNX Runtime training APIs #1335

[ORT Training] Some important updates of ONNX Runtime training APIs #1335

Conversation

JingyaHuang commented Sep 1, 2023 • edited Loading

What does this PR do?

fxmarty commented Sep 16, 2023

JingyaHuang commented Sep 18, 2023

fxmarty left a comment

Choose a reason for hiding this comment

regisss left a comment

Choose a reason for hiding this comment

JingyaHuang commented Oct 18, 2023

JingyaHuang commented Sep 1, 2023 •

edited

Loading