Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error: 700 (0x2bc) cudaErrorIllegalAddress : an illegal memory access was encountered #445

Open
9cat opened this issue Dec 11, 2023 · 3 comments

Comments

@9cat
Copy link

9cat commented Dec 11, 2023

Bladebit Chia Plotter
Version : 3.1.0-dev
Git Commit : e9836f8
Compiled With: gcc 11.4.0

[Global Plotting Config]
Will create 1 plots.
Thread count : 16
Warm start enabled : false
NUMA disabled : false
CPU affinity disabled : false
Farmer public key : xxxxxxxxxxxxxxxxxx367
Pool contract address : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Compression Level : 7
Benchmark mode : disabled
Warning: 16G mode is experimental and still under development.
Please use the --check parameter to validate plots when using this mode.
Direct I/O not supported in 16G mode at the moment. Disabing it.

[Bladebit CUDA Plotter]
Host RAM : 31 GiB
Plot checks : enabled ( 2 )
Plot check threshold: 0.600

Selected cuda device 0 : NVIDIA GeForce GTX 1070
CUDA Compute Capability : 6.1
SM count : 15
Max blocks per SM : 32
Max threads per SM : 2048
Async Engine Count : 2
L2 cache size : 2.00 MB
L2 persist cache max size : 0.00 MB
Stack Size : 1.00 KB
Memory:
Total : 7.92 GB
Free : 7.84 GB

Allocating buffers (this may take a few seconds)...
Kernel RAM required : 4979771088 bytes ( 4749.08 MiB or 4.64 GiB )
Intermediate RAM required : 4529922048 bytes ( 4320.07 MiB or 4.22 GiB )
Host RAM required : 2147483648 bytes ( 2048.00 MiB or 2.00 GiB )
Total Host RAM required : 7127254736 bytes ( 6797.08 MiB or 6.64 GiB )
GPU RAM required : 6314045440 bytes ( 6021.54 MiB or 5.88 GiB )
Allocating buffers...
Done.

Generating plot 1 / 1: d7ce4357f4139ba7acf4c1d2ba211981a8b2da90377004661b5a1226201ab726
Plot temporary file: /nvme/chia/output/plot-k32-c07-2023-12-11-03-01-d7ce4357f4139ba7acf4c1d2ba211981a8b2da90377004661b5a1226201ab726.plot.tmp

Generating F1
Finished F1 in 665.61 seconds.
Table 2 completed in 923.05 seconds with 4294939070 entries.
Table 3 completed in 1197.82 seconds with 4294899125 entries.
Table 4 completed in 1524.01 seconds with 4294813535 entries.
Table 5 completed in 1387.38 seconds with 4294566059 entries.
Table 6 completed in 1138.78 seconds with 4294122506 entries.
Table 7 completed in 831.97 seconds with 4293295409 entries.
Finalizing Table 7
Finalized Table 7 in 352.01 seconds.
Completed Phase 1 in 8020.65 seconds
Marked Table 6 in 49.38 seconds.
Marked Table 5 in 38.19 seconds.
Marked Table 4 in 38.14 seconds.
Marked Table 3 in 36.93 seconds.
Completed Phase 2 in 162.66 seconds
Compressing Table 2 and 3...
Step 1 completed step in 631.68 seconds.
Step 2 completed step in 451.55 seconds.
Completed table 2 in 1083.24 seconds with 3439777426 / 4294899125 entries ( 80.09% ).
Compressing tables 3 and 4...
Step 1 completed step in 481.10 seconds.
CUDA error: 700 (0x2bc) cudaErrorIllegalAddress : an illegal memory access was encountered

*** Panic!!! *** Fatal Error:
CUDA error cudaErrorIllegalAddress : an illegal memory access was encountered.
./bladebit_cuda(_ZN7SysHost14DumpStackTraceEv+0x53)[0x56302ebc6d93]
./bladebit_cuda(_Z9PanicExitv+0xf)[0x56302ed5827f]
./bladebit_cuda(+0xb9e7f)[0x56302eba9e7f]
./bladebit_cuda(_ZN15GpuUploadBuffer11UploadArrayEPKvjjjjPKjP11CUstream_st+0x549)[0x56302ebaa829]
./bladebit_cuda(_ZN15GpuUploadBuffer11UploadArrayEPKvjjjjPKj+0x13)[0x56302ebaaa33]
./bladebit_cuda(_Z22CudaK32PlotPhase3Step2R18CudaK32PlotContext+0x695)[0x56302eb8bc35]
./bladebit_cuda(_Z17CudaK32PlotPhase3R18CudaK32PlotContext+0x1286)[0x56302eb83756]
./bladebit_cuda(_ZN14CudaK32Plotter3RunERK11PlotRequest+0x122f)[0x56302eb7739f]
./bladebit_cuda(main+0xc1f)[0x56302eb6b2df]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f7dd8970d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f7dd8970e40]
./bladebit_cuda(_start+0x25)[0x56302eb6c9a5]


Under Ubuntu 22.08. 32G memory , NV1070GPU.


CHATGPT result:

It looks like you've encountered a fatal error in your CUDA program. The error message indicates a CUDA error with the code cudaErrorIllegalAddress: an illegal memory access was encountered. This error typically occurs when the program tries to access memory that it's not allowed to access.

Here are a few steps you can take to troubleshoot and fix the issue:

Check the CUDA Error Code:
The specific CUDA error code is 700 (cudaErrorIllegalAddress). This suggests a memory access violation. You should carefully review the parts of your code that deal with memory management, especially memory allocation, copying, and freeing.

Review CUDA Kernel Code:
If your code includes CUDA kernel functions (_Z22CudaK32PlotPhase3Step2R18CudaK32PlotContext, _Z17CudaK32PlotPhase3R18CudaK32PlotContext, etc.), review the corresponding kernel code to ensure there are no out-of-bounds memory accesses or other memory-related issues.

Check CUDA API Calls:
Review the CUDA API calls, especially memory-related functions such as cudaMemcpy, cudaMalloc, and cudaFree. Make sure that memory allocations are correct and that data is being transferred correctly between the host and the device.

Verify GPU Compatibility:
Ensure that your GPU is compatible with the CUDA version you are using. Some GPUs may have specific requirements or limitations with certain CUDA versions.

Memory Profiling:
Consider using CUDA memory profiling tools, such as NVIDIA Nsight or CUDA-MEMCHECK, to identify memory-related issues in your code.

Update CUDA Drivers:
Make sure you have the latest NVIDIA CUDA drivers installed. Outdated drivers can sometimes cause compatibility issues.

Check GPU Resource Limits:
Verify that you are not exceeding the GPU's memory limits. CUDA kernels may fail if they attempt to allocate more memory than is available on the GPU.

Review Recent Changes:
If the error started occurring after recent code changes, review those changes for potential issues.


Any idea how to solve it?

@DCTech2k
Copy link

DCTech2k commented Jan 8, 2024

Same issue on Debian 12 with 16GB mode while 128GB mode works fine on the same setup.

GPU: GTX 1080 (as secondary while AMD is primary using MESA driver)
Driver: 545.23.08
CUDA: 12.3

@hedandan1989
Copy link

same problem
ubuntu 22.04 LTS
Driver:535
GTX 1080

@brause
Copy link

brause commented Jul 28, 2024

same here
ubuntu 24.04 LTS
Driver:535
GTX 1070

Does anybody know which driver version this has been tested with ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants