Be aware that this reports as memory bound, not core bound (so does the C++ version). Reduce / remove the dependency chain anyway and the benchmark should show a performance improvement. The C++ version does.
Be aware that this reports as memory bound, not core bound (so does the C++ version). Reduce / remove the dependency chain anyway and the benchmark should show a performance improvement. The C++ version does.