Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding optimum option for PredictEngine #492

Merged
merged 9 commits into from
Dec 10, 2024

Conversation

wwymak
Copy link
Contributor

@wwymak wwymak commented Dec 8, 2024

Related Issue

Resolves #488

Checklist

  • I have read the CONTRIBUTING guidelines.
  • I have added tests to cover my changes.
  • I have updated the documentation (docs folder) accordingly.

Additional Notes

Add any other context about the PR here.

Sorry, something went wrong.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR adds ONNX/Optimum support for text classification models, enabling optimized inference through the PredictEngine with ONNX runtime integration.

  • Added new OptimumClassifier class in /libs/infinity_emb/infinity_emb/transformer/classifier/optimum.py with ONNX runtime support and model optimization capabilities
  • Implemented device-aware quantization preferences in OptimumClassifier for CPU and OpenVINO providers
  • Added comprehensive test suite in /test_optimum_classifier.py comparing against HuggingFace pipeline outputs
  • Extended PredictEngine enum in utils.py to include Optimum as a new inference engine option
  • Disabled IO binding in ONNX runtime for better compatibility via model.use_io_binding = False

💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!

3 file(s) reviewed, 4 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines 10 to 14
engine_args=EngineArgs(
model_name_or_path=model_name,
device="cuda" if torch.cuda.is_available() else "cpu",
) # type: ignore
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Indentation is inconsistent in EngineArgs constructor

import copy
import os

import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: numpy is imported but never used in this file

Comment on lines +82 to +84
def encode_post(self, classes) -> dict[str, float]:
"""runs post encoding such as normalization"""
return classes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: incorrect type hint - method returns list[list[dict]] based on test file, not dict[str, float]

Comment on lines 27 to 28
if CHECK_TRANSFORMERS.is_available:
from transformers import AutoConfig, AutoTokenizer, pipeline # type: ignore[import-untyped]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: AutoConfig is imported but never used

wendy mak added 2 commits December 8, 2024 18:55
@michaelfeil
Copy link
Owner

michaelfeil commented Dec 8, 2024

Thanks for opening this & opening an issue before! Looks good so far.

There is a “make precommit” command in the malefile.
that should do all lint / fix / testing :)

@codecov-commenter
Copy link

codecov-commenter commented Dec 8, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 84.21053% with 6 lines in your changes missing coverage. Please review.

Project coverage is 79.58%. Comparing base (be48378) to head (6df694e).

Files with missing lines Patch % Lines
...emb/infinity_emb/transformer/classifier/optimum.py 88.23% 4 Missing ⚠️
...ibs/infinity_emb/infinity_emb/transformer/utils.py 50.00% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #492      +/-   ##
==========================================
+ Coverage   79.53%   79.58%   +0.05%     
==========================================
  Files          41       42       +1     
  Lines        3430     3468      +38     
==========================================
+ Hits         2728     2760      +32     
- Misses        702      708       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@michaelfeil michaelfeil merged commit c335df8 into michaelfeil:main Dec 10, 2024
36 checks passed
@michaelfeil
Copy link
Owner

Thanks for all the work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimum/Onnx support in PredictEngine
3 participants