To prefetch data in Rust use:
Those two and their *_instruction
equivalents become the LLVM intrinsic llvm.prefetch which I suspect must become one of the asm instructions PREFETCHh, so you could also call that directly:
Under Clang 14 this is not memory bound. The bottleneck appears to be branch prediction. The Rust version matches the C++ version that way. It's possible this won't teach you what the lab intends. Add prefetching anyway, and see if the benchmark shows a performance improvement.