Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker build failing #20

Open
francescosarno opened this issue May 3, 2023 · 7 comments
Open

Docker build failing #20

francescosarno opened this issue May 3, 2023 · 7 comments

Comments

@francescosarno
Copy link

francescosarno commented May 3, 2023

I have a problem while running docker build . I get the following error:

Dockerfile:59
--------------------
  57 |     RUN SHA=ToUcHMe git clone https://github.com/NVIDIA/apex.git
  58 |     WORKDIR /tmp/unique_for_apex/apex
  59 | >>> RUN /opt/miniconda3/envs/py37/bin/pip3 install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
  60 |     #install pytorch3d 
  61 |     # RUN /opt/miniconda3/envs/py37/bin/pip install pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py37_cu102_pyt171/download.html
--------------------
ERROR: failed to solve: process "/bin/sh -c /opt/miniconda3/envs/py37/bin/pip3 install -v --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" ." did not complete successfully: exit code: 1

do you know how to solve this? It seems that this is due to apex.

@DecaYale
Copy link
Owner

DecaYale commented May 6, 2023

Have you solved this? We tested this part, no issue had occurred. Our code can also run with the torch's distributed data parallel without apex. Maybe you just need to modify the code a little bit.

@Kaladin-Syl-WR
Copy link

Hi. I seem to be getting the same issue. Any idea what the problem might be and what I can do to fix it?

@DecaYale
Copy link
Owner

DecaYale commented May 9, 2023

This might be caused by the update of apex repo. I suggest comment this step and try to install apex manually later. Or just use torch's distributed data parallel to replace the usage of apex.
If you are doing an evaluation, you can also just run on a single GPU without the need for apex.
I hope this could help.

@mqtjean
Copy link

mqtjean commented May 10, 2023

Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex.
Indeed, apex.amp is deprecated I had to change from apex import amp to from torch.cuda import amp

@brian2lee
Copy link

Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex. Indeed, apex.amp is deprecated I had to change from apex import amp to from torch.cuda import amp

I've been facing the same problem, you've mentioned changing from apex import amp to from torch.cuda import amp, what file did you change cuz I can't find the line in the dockerfile. Sorry if this is a stupid question since I'm quite noob.

@Nishanth21D
Copy link

Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex. Indeed, apex.amp is deprecated I had to change from apex import amp to from torch.cuda import amp

hey, I did as you mentioned, but it is failing with the below error message "module 'torch.cuda.amp' has no attribute 'float_function'".

Traceback (most recent call last):
File "/home/RNNPose/tools/eval.py", line 26, in
from builder import (
File "/home/RNNPose/builder/rnnpose_builder.py", line 1, in
from builder import losses_builder
File "/home/RNNPose/builder/losses_builder.py", line 2, in
from model import losses
File "/home/RNNPose/model/losses.py", line 22, in
class Loss(nn.Module):
File "/home/RNNPose/model/losses.py", line 65, in Loss
@amp.float_function
AttributeError: module 'torch.cuda.amp' has no attribute 'float_function'

any workaround or can I comment it? Thanks in advance

@mqtjean
Copy link

mqtjean commented Aug 15, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants