Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Remove python version specific usage within extensions #1270

Closed
wants to merge 2 commits into from

Conversation

janeyx99
Copy link
Contributor

@janeyx99 janeyx99 commented Nov 12, 2024

This PR will be landed in two stages in
#1276
#1277


Replace PYBIND usage with torch.ops.load_library in order to allow ao to build 1 wheel for multiple python versions.
This requires pytorch/pytorch#138088

Why is this cool?

You no longer have to build an ao wheel for every python version, saving time, resources, and pypi package space (though this last one might not be a concern).

This proves that there is a way to use torch CPP extensions without dragging an unnecessary python dependency, which was not the case before.

One disclaimer:

This does mean that there will be no more torchao._C python module, which may be BC breaking. There are ways to not BC break (by creating a _C.py and re-exposing all the functions) so I'm not too concerned about that. Though some reviewer should let me know if this is crucial.

Test Plan

Yes, it looks like there is still much CI to fix for ao, but I wanted to be confident that the core change worked before sinking the time into this. How did I gain confidence in this change? Locally, I built ao for python 3.10 using

pip wheel -e .

This produced torchao-0.7.0+gitac340bb4-cp38-abi3-linux_x86_64.whl

Then, I installed this wheel within other conda environments (with fresh torch) for python 3.9 through python 3.12 using pip install torchao-0.7.0+gitac340bb4-cp38-abi3-linux_x86_64.whl To verify this wheel was good, I ran python test/test_ops.py to check that all the tests passed, which they do.

I also made sure that the python setup.py develop path was not broken. How? I ran pip install -e . on the code in this PR and ensured that python test/test_ops.py passed in both 3.9 and 3.10.

What do I want review on?

a. Is it okay that there is no longer a python module for torchao._C? A quick GH search shows 0 uses https://github.com/search?q=torchao._C&type=code in public code.
b. Are there other ways to install ao (other than pip install and pip wheel) that I need to test? Or other uses of the custom ops outside of test/test_ops.py that I need to verify?
c. Do we still want the old behavior to exist? Is there any reason we'd still want to build wheels per python?
d. I assumed py38 was the lowest version supported by ao, but I later learned that pytorch has moved to 3.9 so that should probably be 3.9 now. Does ao follow pytorch in terms of python support?

Based on the answers to the above qs, I will clean up this PR accordingly.

Copy link

pytorch-bot bot commented Nov 12, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1270

Note: Links to docs will display an error until the docs builds have been completed.

❌ 15 New Failures

As of commit ac340bb with merge base addcb24 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 12, 2024
@janeyx99 janeyx99 marked this pull request as ready for review November 12, 2024 19:56
@drisspg
Copy link
Contributor

drisspg commented Nov 12, 2024

Mark has more experience here, We would also want to update the wheel build flow so that it produces py3 wheels not once that specify the version?

As well I think integration into the Nova workflows so that we are pushing the right artifacts is important

@msaroufim
Copy link
Member

a. Is it okay that there is no longer a python module for torchao._C?

I don't think we need it but would need tests to work for sure

b. Are there other ways to install ao (other than pip install and pip wheel) that I need to test? Or other uses of the custom ops outside of test/test_ops.py that I need to verify?

Nope that's it, granted you'll need to see what this does in CI though specifically the build_linux_wheel job which will build us for many different cuda and python versions. Also will need to sanity check that torchao custom extensions work when you download it on a machine with varying python versions

c. Do we still want the old behavior to exist? Is there any reason we'd still want to build wheels per python?

It's mostly just a function of wanting custom ops working, a python agnostic version would be better

d. I assumed py38 was the lowest version supported by ao, but I later learned that pytorch has moved to 3.9 so that should probably be 3.9 now. Does ao follow pytorch in terms of python support?

we just drop support when python versions are at EOL https://devguide.python.org/versions/

@@ -114,6 +114,7 @@ def get_extensions():
extension(
"torchao._C",
sources,
py_limited_api=True,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to have it, but imo not necessary. At least by default, linker will ignore the dependencies, unless one uses some symbols from that library (unless library is linked with -Wno-as-needed)

Copy link
Contributor Author

@janeyx99 janeyx99 Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary only to get python to recognize the extension + therefore the wheel as python agnostic. e.g., for pypi packaging and naming purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also for our cppextension to know not to drag in libtorch_python

Comment on lines +25 to +29
from importlib.util import find_spec
from pathlib import Path
spec = find_spec("torchao")
assert spec is not None, "torchao python module spec is unexpectedly None"
SO_PATH = Path(spec.origin).parent / "_C.abi3.so"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easiest solution are usually the best one :)

Suggested change
from importlib.util import find_spec
from pathlib import Path
spec = find_spec("torchao")
assert spec is not None, "torchao python module spec is unexpectedly None"
SO_PATH = Path(spec.origin).parent / "_C.abi3.so"
SO_PATH = Path(__file__).parent / "_C.abi3.so"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work in the pip wheel -e . followed by the pip install *.whl case!

In that case the _C.abi3.so will live in site-packages (or wherever the user's install is) so we need a more encompassing way to access the location here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share an example? And perhaps it talks about some gaps in setup.py as _C should have been installed in the same folder as __init__.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this should indeed work in the pip wheel/pip install case, because the init.py file and the _C.abi3.so are in the same folder under site-packages:
image

It's funny because I actually started off with what you had lol so dang I did some extra work :'/

Do you know where the logic lives to always place the _C under the same folder as init.py? I don't see mention of it in the ao setup.py, so is this a Python guarantee for custom extensions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know where the logic lives to always place the _C under the same folder as init.py?

@janeyx99 I think it's a setuptools thing

ao/setup.py

Lines 114 to 116 in 01dc7da

extension(
"torchao._C",
sources,

So it automatically understands that it should put _C.so under torchao folder. I played around with meson-python a bit for another project, and they don't do the same thing (meson-python will create torchao._C.so instead)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thanks for the info @gau-nernst!

@janeyx99
Copy link
Contributor Author

Alright thanks for your comments! Sounds like ao would like a single wheel more so I'll make the CI do that as my next step in the PR (amidst getting all the other tests to pass :p)

@msaroufim Just FYI I've done the sanity checks and verified that the custom ops worked + detailed it in the test plan above (that was honestly the most tedious part of this task lol).

Copy link

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • py_limited_api is a nice touch, but strictly speaking is not necessary, , unless cpp_extensions links torch_python with -Wno-as-needed or something like that
  • Python have a weird tendency to call their extensions .so files even on platforms where they are not (and AOTI follows in that anti-threns), but shouldn't it be .dll on Windows and .dylib on MacOS (There are probably already some utils in cpp extensions to decorate library names with platform appropriate prefixes and suffixes )

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024
* initial test

* Pad casual mask with zeroes and set decoder max_seq_len to the max sequence length so their shapes are both set to the max_seq_len

* Fix control bug for image inputs

* Clear image input after submitting a chat

* Include empty assistant message for chat

* Pipe image input from CLI

---------

Co-authored-by: Jack-Khuu <[email protected]>
Co-authored-by: vmpuri <[email protected]>
@janeyx99
Copy link
Contributor Author

Closing in favor of #1276 and #1277

@janeyx99 janeyx99 closed this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants