Skip to content

Commit

Permalink
Update release_notes.md
Browse files Browse the repository at this point in the history
  • Loading branch information
minseokl authored Oct 26, 2023
1 parent 0a17b64 commit bfdbed7
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions release_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
+ Resolved the occasional runtime error in using multiple H800 GPUs

+ **Known Issues**:
+ If we set `max_eval_batches` and `batchsize_eval` to some large values such as 5000 and 12000 respectively, the training process leads to the illegal memory access error. [The issue](https://github.com/NVIDIA/cccl/issues/293) is from the CUB, and is fixed in its latest version. However, because it is only included in CUDA 12.3, which is not used by our NGC container yet, until we update our NGC container to rely upon that version of CUDA, please rebuild HugeCTR with the newest CUB as a workaround. Otherwise, please try to avoid such large `max_eval_batches` and `batchsize_eval`.
+ HugeCTR can lead to a runtime error if client code calls RMM’s `rmm::mr::set_current_device_resource()` or `rmm::mr::set_current_device_resource()` because HugeCTR’s Parquet Data Reader also calls `rmm::mr::set_current_device_resource()`, and it becomes visible to other libraries in the same process. Refer to [this issue] (https://github.com/NVIDIA-Merlin/HugeCTR/issues/356) . As a workaround, a user can set an environment variable `HCTR_RMM_SETTABLE` to 0 to disable HugeCTR to set a custom RMM device resource, if they know `rmm::mr::set_current_device_resource()` is called outside HugeCTR. But be cautious, as it could affect the performance of parquet reading.
+ HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources.
If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:
Expand Down

0 comments on commit bfdbed7

Please sign in to comment.