Skip to content
This repository has been archived by the owner on May 7, 2024. It is now read-only.

Incorrect store merging causes misaligned stores #289

Open
Shai-Aviv opened this issue Aug 29, 2021 · 7 comments
Open

Incorrect store merging causes misaligned stores #289

Shai-Aviv opened this issue Aug 29, 2021 · 7 comments

Comments

@Shai-Aviv
Copy link

Shai-Aviv commented Aug 29, 2021

Compiling this C code with commit 5964b5c (GCC 11.1.0) and the options -march=rv64gc -O3:

#include <stdint.h>

void zero_two_uint16(uint16_t* ptr) {
    ptr[0] = 0;
    ptr[1] = 0;
}

void zero(uint16_t* ptr) {
    for (int i = 0; i < 16; ++i) {
        zero_two_uint16(ptr);
        ptr += 2;
    }
}

yields the following (incorrect) assembly:

<zero_two_uint16>:
        sh      zero,0(a0)
        sh      zero,2(a0)
        ret

<zero>:
        andi    a5,a0,7
        sd      zero,0(a0)
        sd      zero,8(a0)
        sd      zero,16(a0)
        sd      zero,24(a0)
        sd      zero,32(a0)
        sd      zero,40(a0)
        sd      zero,48(a0)
        sd      zero,56(a0)
        bnez    a5,<.L4>
        ret

<.L4>:
        ret

This causes misaligned stores when the pointer given to zero() is not aligned on a 4-byte boundary. I tried to use the option -fno-store-merging as a workaround to prevent this, but the output was the same (which I suspect is yet another bug).

With GCC 10.2.0, the output seems fine (again with -march=rv64gc -O3):

<zero_two_uint16>:
        sh      zero,0(a0)
        sh      zero,2(a0)
        ret

<zero>:
        andi    a5,a0,7
        bnez    a5,<.L4>
        sd      zero,0(a0)
        sd      zero,8(a0)
        sd      zero,16(a0)
        sd      zero,24(a0)
        sd      zero,32(a0)
        sd      zero,40(a0)
        sd      zero,48(a0)
        sd      zero,56(a0)
        ret

<.L4>:
        sh      zero,0(a0)
        sh      zero,2(a0)
        sh      zero,4(a0)
        sh      zero,6(a0)
        sh      zero,8(a0)
        sh      zero,10(a0)
        sh      zero,12(a0)
        sh      zero,14(a0)
        sh      zero,16(a0)
        sh      zero,18(a0)
        sh      zero,20(a0)
        sh      zero,22(a0)
        sh      zero,24(a0)
        sh      zero,26(a0)
        sh      zero,28(a0)
        sh      zero,30(a0)
        sh      zero,32(a0)
        sh      zero,34(a0)
        sh      zero,36(a0)
        sh      zero,38(a0)
        sh      zero,40(a0)
        sh      zero,42(a0)
        sh      zero,44(a0)
        sh      zero,46(a0)
        sh      zero,48(a0)
        sh      zero,50(a0)
        sh      zero,52(a0)
        sh      zero,54(a0)
        sh      zero,56(a0)
        sh      zero,58(a0)
        sh      zero,60(a0)
        sh      zero,62(a0)
        ret
@kito-cheng
Copy link
Collaborator

Thanks for report that, it seems interesting bug...that's not cause by -fstore-merging, it cause by -ftree-vectorize, so the workaround for this case is apply -fno-tree-vectorize to your CFLAGS.

I need more time to figure out what happen.

@jim-wilson
Copy link
Collaborator

A git bisect on the FSF GCC tree points the blame at

commit f5e18dd
Author: Kewen Lin [email protected]
Date: Tue Nov 3 02:51:47 2020 +0000

pass: Run cleanup passes before SLP [PR96789]
...

This suggests the failure occurs in 168t.fre4. Except that the wrong code doesn't appear until 299r.jump2. I get the right result if I use -fno-slp-vectorize, or if I comment out the pass_fre inside pass_pre_slp_scalar_cleanup in passes.def. Very odd. There might be some internal state that doesn't appear in the dump files that is messed up. Or maybe git bisect is accidentally pointing me at the wrong patch because of complex interactions between the various optimization patches. This needs more investigation.

@jim-wilson
Copy link
Collaborator

I was looking at where the block of half word stores was being optimized away which happens late, but is not where the real problem is. I failed to notice that they weren't half-word stores anymore. The problem occurs in 170t.slp1 which vectorizes the block of half-word stores. This appears to happen because 168t.fre4 optimized away a phi node. So my bisect appears to be correct. I'm not an expert on any of these passes in question, so we should file this upstream. This looks target independent, but is only a problem on targets that don't support unaligned loads/stores, and RISC-V is unfortunately the only major architecture in that category.

@jim-wilson
Copy link
Collaborator

I filed a bug upstream
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102139
with a bit more info and verifying that it is still broken on mainline sources.

@jim-wilson
Copy link
Collaborator

There is a fix on FSF GCC mainline for this now. No fix available for gcc-11 yet.

@IdanHo
Copy link

IdanHo commented Mar 13, 2022

Looks like it was cherry picked into gcc-11, can we get it into the riscv-gcc repo?

@kito-cheng
Copy link
Collaborator

GCC 12 will release soon, I plan bump to GCC 12 once it release, so that problem should gone :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants