Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip_parse() does not correctly read *.pth files? #2071

Open
EricCousineau-TRI opened this issue Jul 17, 2024 · 12 comments
Open

pip_parse() does not correctly read *.pth files? #2071

EricCousineau-TRI opened this issue Jul 17, 2024 · 12 comments

Comments

@EricCousineau-TRI
Copy link

EricCousineau-TRI commented Jul 17, 2024

🚀 feature request

Relevant Rules

package_annotation() from pip.bzl

Description

From docs, as of time of writing, you can add BUILD content, or add / remove files, but doesn't seem like you can change py_library(deps, imports).

For example, rerun.io is an awesome-looking package.
But it doesn't work when using using bazel + rules_python @ 0.34.0:
https://github.com/EricCousineau-TRI/repro/tree/e084a7434286cf426a71350f79755c7cbab6d6eb/bazel/rules_python_rerun
Seems like either the wheel doesn't declare things s.t. rules_python adds ./site-packages/rerun_sdk to the Python path (it only adds ./site-packages).

Describe the solution you'd like

For now, would be nice to have something like

pip_parse(
    ...
    annotations = {
        "rerun-sdk": package_annotation(
            extra_imports = ["site-packages/rerun_sdk"],
        ),
    },
)

Describe alternatives you've considered

For now, will try package_annotation(..., additive_build_content), and include that whenever I use pip("rerun-sdk").

EDIT: This workaround worked: EricCousineau-TRI/repro@96e1336

@EricCousineau-TRI EricCousineau-TRI changed the title Add option to inject additonal imports and deps via package_annotation Add option to inject additonal imports and deps via package_annotation ? Jul 17, 2024
@aignas
Copy link
Collaborator

aignas commented Jul 18, 2024

imports is not supported, but deps is supported via the whl patching mechanism. You should just patch the whl METADATA file that defines the dependencies and then our code generation will do the right thing. Why do you need to modify imports?

@EricCousineau-TRI
Copy link
Author

EricCousineau-TRI commented Jul 18, 2024

in this case, it's missing site-packages/rerun_sdk in the PYTHONPATH:
https://github.com/EricCousineau-TRI/repro/tree/e084a7434286cf426a71350f79755c7cbab6d6eb/bazel/rules_python_rerun
it's not an inter-package dependency, but rather a missing dir on path within the whl itself.

perhaps it's because there's a .pth file that rules_python is not parsing? from above linked repro

$ cd $(bazel info output_base)
$ cat external/pip_deps_rerun_sdk/site-packages/rerun_sdk.pth
rerun_sdk

from docs https://docs.python.org/3/library/site.html
it seems like the .pth file should effectively change sys.path (or in this case, imports)?

@EricCousineau-TRI
Copy link
Author

EricCousineau-TRI commented Jul 18, 2024

(if this seems like a valid thing, I can file a new issue edit this issue and reword the title)

@aignas
Copy link
Collaborator

aignas commented Jul 19, 2024

Thanks for the repro and the logs, this makes it easier to understand. The rules_python not handling the .pth file could be something that is the problem here.

I think ideally the solution would be to investigate how to make rules_python respect the .pth files, but I am not sure how to do it yet. Supporting imports or deps via the annotations does not seem like the right solution here, because it increases the API surface and still forces all of the users to apply workarounds.

@EricCousineau-TRI EricCousineau-TRI changed the title Add option to inject additonal imports and deps via package_annotation ? pip_parse() does not correctly read *.pth files? Jul 19, 2024
@EricCousineau-TRI
Copy link
Author

EricCousineau-TRI commented Jul 19, 2024

sweet, welcome

I think simplest way is to assume it's non-executable *.pth (maybe check for import sys as simple check) and just have code populate imports:

# This makes this directory a top-level in the python import
# search path for anything that depends on this.
imports = ["site-packages"],

would probably need something like parsed_pth_files as argument to that function, and then the repository rule has to cat those contents accordingly.

@EricCousineau-TRI
Copy link
Author

very loose code: main...EricCousineau-TRI:rules_python:issue-2071

@aignas
Copy link
Collaborator

aignas commented Jul 20, 2024 via email

@groodt
Copy link
Collaborator

groodt commented Aug 24, 2024

Closing this issue as a duplicate of #2156

Any PR to workaround .pth is welcome, but the PR I've linked above is a more holistic issue to capture the overall friction that rules_python causes for some packages.

@groodt groodt closed this as completed Aug 24, 2024
@vonschultz
Copy link
Contributor

I think the main idea of just reading the pth file in starlark and then appending the imports would be a nice way to do it.

With Python 3.12, distutils starts making trouble all over the place. The solution is to pull in setuptools as a dependency and make sure the distutils-precedence.pth from setuptools runs. This .pth file is

import os; var = 'SETUPTOOLS_USE_DISTUTILS'; enabled = os.environ.get(var, 'local') == 'local'; enabled and __import__('_distutils_hack').add_shim(); 

With this in mind, I don't think that reading the pth file in starlark and appending the imports would work. It might work for some packages, but not for setuptools, and that's going to be one of the more important ones to support in any non-trivial Python context.

At our company, we're using a workaround where we create a sitecustomize.py file where we loop through sys.path and run site.addsitedir(directory) if the directory looks like it belongs to setuptools. One might imagine having rules_python helping the user to create a useful sitecustomize.py file, possibly even creating it automatically and putting it on the Python path. If we're solving this with #2156 we might not need to, though.

@aignas
Copy link
Collaborator

aignas commented Dec 26, 2024

I think the issue is that there can be only one sitecustomize.py file as noted in #2169 (comment)

Is setuptools used in building wheels in pip.parse in this case or somewhere else?

@vonschultz
Copy link
Contributor

vonschultz commented Dec 27, 2024

In this case, setuptools isn't really used as such. The problem is distutils. You might have code that runs import tensorflow, and that triggers

import distutils as _distutils

https://github.com/tensorflow/tensorflow/blob/5bc9d26649cca274750ad3625bd93422617eed4b/tensorflow/api_template.__init__.py#L30

In Python 3.11 and earlier, this is an import from the standard library, so we're not really involving setuptools — except that distutils has for a long time recommended using the setuptools package instead. Setuptools has integrated a complete copy of distutils and is no longer dependent on the standard library, and once we're in Python 3.12, that's the only distutils there is. See PEP632.

setuptools makes its distutils module available when the distutils-precedence.pth file in the setuptools wheel runs. The way rules_python currently works, though, the pth file ends up in the wrong directory and won't be loaded unless you manually invoke site.addsitedir(directory).

@aignas
Copy link
Collaborator

aignas commented Jan 1, 2025

I am somewhat thinking that setuptools should be treted in rules_python in a special way.

I have no other ideas on how to fix this - maybe we coud have a special BUILD.bazel file for that package and reuse the same setuptools repository everywhere. If the user tries to include it via pip in bzlmod, we could easily handle that. Doing that in workspace may be too much work.

Potentially related - #2370

@aignas aignas reopened this Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants