Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU only version of molfeat #65

Open
1 task done
kkovary opened this issue Jun 29, 2023 · 8 comments
Open
1 task done

CPU only version of molfeat #65

kkovary opened this issue Jun 29, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@kkovary
Copy link
Contributor

kkovary commented Jun 29, 2023

Is there an existing issue for this?

  • I have searched the existing issues and found nothing

Bug description

I've been trying to build some application around molfeat and installing it on different systems like GitHub action runners without GPUs, old/lightweight servers, or other testing environments is currently really difficult due to the reliance on a GPU reliant version of PyTorch.

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- Molfeat version (e.g., 0.1.0):
#- PyTorch Version (e.g., 1.10.0):
#- RDKit version (e.g., 2022.09.5): 
#- scikit-learn version (e.g.,  1.2.1): 
#- OS (e.g., Linux):
#- How you installed Molfeat (`conda`, `pip`, source):

Additional context

No response

@kkovary kkovary added the bug Something isn't working label Jun 29, 2023
@maclandrol
Copy link
Member

maclandrol commented Jun 29, 2023

Hello @kkovary, installing molfeat on a system without GPU should work (see our CI) for example. Can you share the steps you are using ?

@kkovary
Copy link
Contributor Author

kkovary commented Jun 29, 2023

Hi @maclandrol thanks for getting back to me. I'm wondering if the issue is arising from my team using poetry to manage dependencies and your team using mamba.

currently our pyproject.toml file looks like:

[tool.poetry]
name = "chem-transformer"
version = "0.1.0"
description = ""
authors = ["Your Name <[email protected]>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.10"
pydantic = "^1.10.8"
rdkit = "^2023.3.1"
molfeat = "^0.8.9"
datamol = "^0.10.3"
torch = [
     {version = "^1.13.0", markers = "sys_platform == 'macos'", optional = true},
     {version = "^1.13.0", markers = "sys_platform == 'linux'", optional = true},
 ]

[tool.poetry.group.ci.dependencies]
torch = "^1.11.0+cpu"
pytest = "^7.3.1"

[tool.poetry.dependencies.pytest]
version = "^7.3.1"
optional = true

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

This works fine for test environments, but when trying to build a docker image that will run on a CPU only system this fails due to missing CUDA libraries.

@maclandrol
Copy link
Member

Thanks @kkovary, I will investigate it in the morning.

If you can share the exact error so I can try to reproduce (or find the dependency behind the issue), that would be nice. I can only see torch, since you are not installing the extra dependencies, but I will run some tests.

Can you confirm that removing molfeat from the poetry file above works ?

@kkovary
Copy link
Contributor Author

kkovary commented Jun 29, 2023

Hi @maclandrol I stripped down the pyproject.toml file to remove the torch work-arounds that were included above

[tool.poetry]
name = "chem-transformer"
version = "0.1.0"
description = ""
authors = ["Your Name <[email protected]>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.10"
pydantic = "^1.10.8"
rdkit = "^2023.3.1"
datamol = "^0.10.3"
molfeat = "^0.8.9"

[tool.poetry.group.ci.dependencies]
pytest = "^7.3.1"

[tool.poetry.dependencies.pytest]
version = "^7.3.1"
optional = true

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

These are the errors that I'm seeing in the Github runner that we're using (ubuntu-latest):

 from chem_transformer.datamol_feats import Molecule
/home/runner/.cache/pypoetry/virtualenvs/library-enumerator-qd3v0RhZ-py3.10/lib/python3.10/site-packages/chem_transformer/datamol_feats.py:6: in <module>
    from molfeat.calc import _CALCULATORS, FP_FUNCS, get_calculator
/home/runner/.cache/pypoetry/virtualenvs/library-enumerator-qd3v0RhZ-py3.10/lib/python3.10/site-packages/molfeat/calc/__init__.py:3: in <module>
    from .cats import CATS
/home/runner/.cache/pypoetry/virtualenvs/library-enumerator-qd3v0RhZ-py3.10/lib/python3.10/site-packages/molfeat/calc/cats.py:21: in <module>
    from molfeat.utils.datatype import to_numpy
/home/runner/.cache/pypoetry/virtualenvs/library-enumerator-qd3v0RhZ-py3.10/lib/python3.10/site-packages/molfeat/utils/datatype.py:6: in <module>
    import torch
/home/runner/.cache/pypoetry/virtualenvs/library-enumerator-qd3v0RhZ-py3.10/lib/python3.10/site-packages/torch/__init__.py:228: in <module>
    _load_global_deps()
/home/runner/.cache/pypoetry/virtualenvs/library-enumerator-qd3v0RhZ-py3.10/lib/python3.10/site-packages/torch/__init__.py:189: in _load_global_deps
    _preload_cuda_deps(lib_folder, lib_name)
/home/runner/.cache/pypoetry/virtualenvs/library-enumerator-qd3v0RhZ-py3.10/lib/python3.10/site-packages/torch/__init__.py:154: in _preload_cuda_deps
    raise ValueError(f"{lib_name} not found in the system path {sys.path}")
E   ValueError: libcublas.so.*[0-9] not found in the system path ['/home/runner/work/chem-transformer/chem-transformer/apps', '/home/runner/work/chem-transformer/chem-transformer/apps/library_enumerator', '/opt/hostedtoolcache/Python/3.10.12/x64/lib/python310.zip', '/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10', '/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/lib-dynload', '/home/runner/.cache/pypoetry/virtualenvs/library-enumerator-qd3v0RhZ-py3.10/lib/python3.10/site-packages']
=========================== short test summary info ============================
ERROR tests/test_enumerator.py - ValueError: libcublas.so.*[0-9] not found in the system path ['/home/runner/work/chem-transformer/chem-transformer/apps', '/home/runner/work/chem-transformer/chem-transformer/apps/library_enumerator', '/opt/hostedtoolcache/Python/3.10.12/x64/lib/python310.zip', '/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10', '/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/lib-dynload', '/home/runner/.cache/pypoetry/virtualenvs/library-enumerator-qd3v0RhZ-py3.10/lib/python3.10/site-packages']

From what I can tell, the error is raised when the Python interpreter is unable to find the libcublas.so library in the system path. This library is a part of the CUDA toolkit and is required by PyTorch for GPU-accelerated operations.

The error occurs when PyTorch is being imported in the molfeat package, which is a dependency of the chem_transformer package we're developing. PyTorch tries to preload CUDA dependencies, including libcublas.so, but fails to find it in the system path.

@rhjohnstone expressed a similar request over slack here.

@maclandrol
Copy link
Member

maclandrol commented Jun 29, 2023

Since maybe torch 1.13, some cuda dependencies are downloaded with the pip version of pytorch. Downgrading to an older version of torch could work, but doesn't seem like a long term solution. Using conda/mamba would likely fix the issue too.

I looked around a bit, and there seems to be an history between torch and poetry:

I am not a poetry user, but can you check if these issues are relevant ?

I will try to make torch optional or add optional dependencies for [cpu-only] or [gpu] in the pyproject file. Let me know if that can give you enough flexibility.

@maclandrol
Copy link
Member

@jstlaurent any inputs here ?

@jstlaurent
Copy link
Contributor

@maclandrol and @kkovary : My apologies, it's taken me a while to get around to looking at this issue.

I'm not a Poetry expert by a long-shot, so unfortunately I wasn't able to find a good solution to your issue, @kkovary.

You can overwrite the torch variant to select the CPU version in your pyproject.toml file, like so:

[tool.poetry.dependencies]
python = "^3.10"
pydantic = "^1.10.8"
rdkit = "^2023.3.1"
datamol = "^0.10.3"
molfeat = "^0.9.2"
torch = { version = "^2.0.0", source="torch-cpu"}

[[tool.poetry.source]]
name = "PyPI"
priority = "primary"

[[tool.poetry.source]]
name = "torch-cpu"
url = "https://download.pytorch.org/whl/cpu"
priority = "supplemental"

But this will always select the CPU variant, and never the GPU enabled one.

Unfortunately, even Poetry's dependency groups can't help us here, because Poetry takes into account all dependencies, including optional ones, when resolving the package to install. So this will also always select the CPU variant:

[tool.poetry.dependencies]
python = "^3.10"
pydantic = "^1.10.8"
rdkit = "^2023.3.1"
datamol = "^0.10.3"
molfeat = "^0.9.2"

[tool.poetry.group.cpu]
optional = true

[tool.poetry.group.cpu.dependencies]
torch = { version = "^2.0.0", source="torch-cpu"}

[[tool.poetry.source]]
name = "PyPI"
priority = "primary"

[[tool.poetry.source]]
name = "torch-cpu"
url = "https://download.pytorch.org/whl/cpu"
priority = "supplemental"

Even if you run poetry install --without cpu, Poetry will conclude that torch==2.0.1+cpu is the package that satisfies every requirements.

My suggestion to you, as unpleasant as it might be, is to maintain two pyproject.toml: one default for GPU build, and another with the explicit CPU-only version, for you CPU build.

@maclandrol: To solve this upstream of Poetry users, we would probably have to manage which version of torch gets pulled in molfeat proper, using extras to have users explicitly pick CPU vs GPU options.

@maclandrol
Copy link
Member

@jstlaurent I attempted a pytorch-free version but it was a bad idea overall, so maybe the cpu-only extra tag might be the solution indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants