You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Applications using the read() and write() calls must supply buffers aligned on a page boundary (usually 4 K). If the alignment is not correct, the call returns -EINVAL.
However, on the same system, using 4KiB as the alignment might work for another filesystem, for example ext4 on NVMe SSDs.
If we expect that I/O operations will always read very large chunks of the file, then setting the alignment to always be equal to the page size should be fine, so we can use sysconf(_SC_PAGESIZE); instead of 4096. If we expect smaller reads, then using a larger than necessary alignment could be detrimental to performance: e.g. when having to read 64KiB when you only want to 1KiB. In this latter case, the code might very well need to figure out what are the alignment requirements at runtime, or expose knobs for users to override it if their application crash.
To Reproduce
On a system with 64KiB pages, try loading a file from a Lustre filesystem with HugeCTR, it will fail:
terminate called after throwing an instance of 'std::runtime_error' what(): io_getevents returned failed event: Invalid argument
The text was updated successfully, but these errors were encountered:
Describe the bug
The
aio
reader code assumes that direct I/O operations must always be aligned to 4096:https://github.com/NVIDIA-Merlin/HugeCTR/blob/v24.04.00/HugeCTR/src/data_readers/multi_hot/detail/aio_context.cpp#L122
However, it depends on multiple factors like the filesystem type and the base page size of the kernel.
On systems with a 64KiB page size, I/O on the Lustre filesystem will fail if we only align accesses to 4096:
https://doc.lustre.org/lustre_manual.xhtml#performing_directio
However, on the same system, using 4KiB as the alignment might work for another filesystem, for example ext4 on NVMe SSDs.
If we expect that I/O operations will always read very large chunks of the file, then setting the alignment to always be equal to the page size should be fine, so we can use
sysconf(_SC_PAGESIZE);
instead of 4096. If we expect smaller reads, then using a larger than necessary alignment could be detrimental to performance: e.g. when having to read 64KiB when you only want to 1KiB. In this latter case, the code might very well need to figure out what are the alignment requirements at runtime, or expose knobs for users to override it if their application crash.To Reproduce
On a system with 64KiB pages, try loading a file from a Lustre filesystem with HugeCTR, it will fail:
The text was updated successfully, but these errors were encountered: