Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

一些在配置read.me中出现的问题 #53

Open
xsd-yh opened this issue Dec 4, 2024 · 3 comments
Open

一些在配置read.me中出现的问题 #53

xsd-yh opened this issue Dec 4, 2024 · 3 comments

Comments

@xsd-yh
Copy link

xsd-yh commented Dec 4, 2024

你好,请问我在按照你们read.me的步骤中到了——Accuracy这一步中的:
bash scripts/reproduce_test/outdoor_full_auc.sh
在终端后运行报错:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.60 GiB (GPU 0; 7.75 GiB total capacity; 7.04 GiB already allocated; 114.06 MiB free; 7.14 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
说我环境里的pytorch已经占用了GPU的5.5G,我GPU总共8G(型号RTX4060),我尝试很多办法去解决但还是不能解决下面是一些关于问题的解释:
CUDA out of memory:CUDA内存不足,意味着你的GPU没有足够的空间来分配给PyTorch进行计算。
Tried to allocate 1.60 GiB:尝试分配1.60GB的内存。
GPU 0:指的是你尝试在编号为0的GPU上进行内存分配。
7.75 GiB total capacity:该GPU的总容量为7.75GB。
7.04 GiB already allocated:已经有7.04GB的内存被分配出去了。
114.06 MiB free:当前还有114.06MB的内存是空闲的。
7.14 GiB reserved in total by PyTorch:PyTorch总共保留了7.14GB的内存。
请问如何解决这个问题,还有我看你们文章中提到We use the AdamW optimizer with an initial learning rate of 4 × 10−3 . The network training takes about 15 hours with a batch size of 16 on 8 NVIDIA V100 GPUs,那么是否必须用你们实验室的条件才能复现测试结果,我的设备是不是因为此原因导致以上步骤(即已上报错)无法进行?

@wyf2020
Copy link
Contributor

wyf2020 commented Dec 11, 2024

你好,看起来是RTX4060的显存不够,测试结果并不需要V100才能复现,对于indoor图像分辨率是640*480,可能8G显存是够用的,但是对于outdoor图像分辨率是1152*1152可能需要10G的显存。如果只有8G显存的话,可以使用一些trick降低显存开销。

  1. 在脚本中使用--flash和--half选项
  2. 通过设置PYTORCH_CUDA_ALLOC_CONF环境变量减少pytorch内存分配碎片
  3. 删除一些代码中不需要的中间变量,或者只有训练才需要的中间变量来降低peak memory

@xsd-yh
Copy link
Author

xsd-yh commented Dec 11, 2024

好的谢谢我尝试一下

@xsd-yh
Copy link
Author

xsd-yh commented Dec 11, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants