-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix strict-aliasing errors #638
Conversation
d3a73ee
to
7ceef39
Compare
b4d641c
to
69414e2
Compare
@aqrit how do you think about the new refactoring? I am using memcpy to recast variables now |
The above seems to be working. Is my understanding correct? |
@jserv Yes. Here I have fixed If this PR is ok, I will remove the optimization option and turn this PR into ready for review |
Except for the |
How about rename them as |
It sounds pretty well. |
e14d9b8
to
09b21b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is not a trivial change, the git commit messages should be amended to be more informative. You might consider to split the proposed changes into several commits.
@jserv I have added the commit message. |
It makes sense to decouple |
Some tests/intrinsics don't behave correctly under optimization. `OPTNONE` and `FORCE_INLINE_OPTNONE` are used to disable these tests and intrinsics.
Casting vector registers to a pointer or accessing the address of a variable and then casting it to a different pointer type would violate strict aliasing rules, potentially causing strict aliasing errors. `sse2neon_recast_u64_f64` and `sse2neon_recast_f64_s64` are introduced to solve this error. closes DLTcollab#635 DLTcollab#639
I think it is ok now |
The OPTNONE changes in DLTcollab#638 introduced 25% performance regression in Cycles render engine running on Apple Silicon: https://projects.blender.org/blender/blender/issues/126408 The reason for this is because as per Clang documentation optnone is incompatible with inline, so none of the functions that is marked as optnone is inlined. The biggest bottleneck for Cycles after that change is _mm_mul_ps. This change makes it so _mm_mul_ps is inlined and is no longer marked as optnone. This solves the immediate performance regression, and the correctness is verified using the sse2neon's test suit on M2 Ultra and M3 Max, with various optimization levels (default, -O2, -O3).
The OPTNONE changes in DLTcollab#638 introduced 25% performance regression in Cycles render engine running on Apple Silicon: https://projects.blender.org/blender/blender/issues/126408 The reason for this is because as per Clang documentation optnone is incompatible with inline, so none of the functions that is marked as optnone is inlined. The biggest bottleneck for Cycles after that change is _mm_mul_ps. This change makes it so _mm_mul_ps is inlined and is no longer marked as optnone. This solves the immediate performance regression, and the correctness is verified using the sse2neon's test suit on M2 Ultra and M3 Max, with various optimization levels (default, -O2, -O3). Additionally, adding -Wstrict-aliasing flag does not introduce new warnings.
Casting vector registers to a pointer or accessing the address of a
variable and then casting it to a different pointer type would
violate strict aliasing rules, potentially causing strict aliasing
errors.
sse2neon_recast_u64_f64
andsse2neon_recast_f64_s64
are introducedto solve this error.
closes #635 #639