[XPU][OptRed] Revamp `-tritonintelgpu-optimize-reduction-locality` #2800

victor-eds · 2024-11-22T14:43:33Z

Original implementation had two critical issues:

Functional: It did not preserve register order, so it was computing a different reduction.
Performance: When converting back to the original tensor type, it did: reshape(convert_layout(res)). That means the reshape operation served as an anchor and the suboptimal slice layout was propagated.

This was fixed as follows:

Keep register order.
Do convert_layout(reshape(res)) when converting back to the original type, thus propagating the more optimal layout.

See implementation for further details.

Original implementation had two critical issues: - *Functional*: It did not preserve register order, so it was computing a different reduction. - *Performance*: When converting back to the original tensor type, it did: `reshape(convert_layout(res))`. That means the `reshape` operation served as an anchor and the suboptimal slice layout was propagated. This was fixed as follows: - Keep register order. - Do `convert_layout(reshape(res))` when converting back to the original type, thus propagating the more optimal layout. See implementation for further details. Signed-off-by: victor-eds <[email protected]>

victor-eds requested review from whitneywhtsang, chengjunlu and a team November 22, 2024 14:43

victor-eds self-assigned this Nov 22, 2024

etiotto approved these changes Nov 26, 2024

View reviewed changes

victor-eds merged commit ee78046 into intel:main Nov 27, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU][OptRed] Revamp `-tritonintelgpu-optimize-reduction-locality` #2800

[XPU][OptRed] Revamp `-tritonintelgpu-optimize-reduction-locality` #2800

victor-eds commented Nov 22, 2024

[XPU][OptRed] Revamp -tritonintelgpu-optimize-reduction-locality #2800

[XPU][OptRed] Revamp -tritonintelgpu-optimize-reduction-locality #2800

Conversation

victor-eds commented Nov 22, 2024

[XPU][OptRed] Revamp `-tritonintelgpu-optimize-reduction-locality` #2800

[XPU][OptRed] Revamp `-tritonintelgpu-optimize-reduction-locality` #2800