Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix strict-aliasing errors #638

Merged
merged 2 commits into from
Jul 18, 2024
Merged

fix: Fix strict-aliasing errors #638

merged 2 commits into from
Jul 18, 2024

Conversation

howjmay
Copy link
Collaborator

@howjmay howjmay commented Jun 25, 2024

Casting vector registers to a pointer or accessing the address of a
variable and then casting it to a different pointer type would
violate strict aliasing rules, potentially causing strict aliasing
errors.

sse2neon_recast_u64_f64 and sse2neon_recast_f64_s64 are introduced
to solve this error.

closes #635 #639

@howjmay howjmay force-pushed the test-opt branch 2 times, most recently from d3a73ee to 7ceef39 Compare June 25, 2024 07:41
@howjmay howjmay force-pushed the test-opt branch 27 times, most recently from b4d641c to 69414e2 Compare June 28, 2024 09:50
@howjmay
Copy link
Collaborator Author

howjmay commented Jul 9, 2024

@aqrit how do you think about the new refactoring? I am using memcpy to recast variables now

@jserv
Copy link
Member

jserv commented Jul 9, 2024

I still need some time to investigate the cause that makes _mm_dp_pd and _mm_dp_ps failed.

The above seems to be working. Is my understanding correct?

@howjmay
Copy link
Collaborator Author

howjmay commented Jul 9, 2024

@jserv Yes. Here I have fixed _mm_dp_pd and _mm_dp_ps. The whole test cases were optimized in a wrong way, so we need to disable optimization for these two test cases.

If this PR is ok, I will remove the optimization option and turn this PR into ready for review

@jserv
Copy link
Member

jserv commented Jul 9, 2024

If this PR is ok, I will remove the optimization option and turn this PR into ready for review

Except for the recast_f64 and recast_i64 functions (the naming should be amended for consistency), I think the proposed changes are fine."

@howjmay
Copy link
Collaborator Author

howjmay commented Jul 10, 2024

If this PR is ok, I will remove the optimization option and turn this PR into ready for review

Except for the recast_f64 and recast_i64 functions (the naming should be amended for consistency), I think the proposed changes are fine."

How about rename them as sse2neon_recast_u64_f64 and sse2neon_recast_f64_s64?

@jserv
Copy link
Member

jserv commented Jul 15, 2024

How about renaming them as sse2neon_recast_u64_f64 and sse2neon_recast_f64_s64?

It sounds pretty well.

@howjmay howjmay force-pushed the test-opt branch 2 times, most recently from e14d9b8 to 09b21b8 Compare July 18, 2024 06:54
@howjmay howjmay marked this pull request as ready for review July 18, 2024 07:02
@howjmay howjmay requested review from jserv and marktwtn as code owners July 18, 2024 07:02
Copy link
Member

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is not a trivial change, the git commit messages should be amended to be more informative. You might consider to split the proposed changes into several commits.

@howjmay
Copy link
Collaborator Author

howjmay commented Jul 18, 2024

@jserv I have added the commit message.
However, when it comes to splitting this PR into several small commits, I think I can only separate it into a commit that introducing OPTNONE for some tests/intrinsics and another one handling the strict-aliasing errors.
How do you think about it?

@jserv
Copy link
Member

jserv commented Jul 18, 2024

I think I can only separate it into a commit that introducing OPTNONE for some tests/intrinsics and another one handling the strict-aliasing errors.

It makes sense to decouple FORCE_INLINE_OPTNONE specific commit.

@jserv
Copy link
Member

jserv commented Jul 18, 2024

Is it time to close #635 and #639 ? If so, refine the git commit messages accordingly.

howjmay added 2 commits July 18, 2024 17:50
Some tests/intrinsics don't behave correctly under optimization.
`OPTNONE` and `FORCE_INLINE_OPTNONE` are used to disable these
tests and intrinsics.
Casting vector registers to a pointer or accessing the address of a
variable and then casting it to a different pointer type would
violate strict aliasing rules, potentially causing strict aliasing
errors.

`sse2neon_recast_u64_f64` and `sse2neon_recast_f64_s64` are introduced
to solve this error.

closes DLTcollab#635 DLTcollab#639
@howjmay
Copy link
Collaborator Author

howjmay commented Jul 18, 2024

I think it is ok now

@jserv jserv merged commit 200cd4e into DLTcollab:master Jul 18, 2024
16 checks passed
@howjmay howjmay deleted the test-opt branch July 18, 2024 13:49
sergeyvfx added a commit to sergeyvfx/sse2neon that referenced this pull request Aug 16, 2024
The OPTNONE changes in DLTcollab#638 introduced 25% performance regression in
Cycles render engine running on Apple Silicon:

  https://projects.blender.org/blender/blender/issues/126408

The reason for this is because as per Clang documentation optnone is
incompatible with inline, so none of the functions that is marked as
optnone is inlined. The biggest bottleneck for Cycles after that
change is _mm_mul_ps.

This change makes it so _mm_mul_ps is inlined and is no longer marked
as optnone. This solves the immediate performance regression, and the
correctness is verified using the sse2neon's test suit on M2 Ultra
and M3 Max, with various optimization levels (default, -O2, -O3).
sergeyvfx added a commit to sergeyvfx/sse2neon that referenced this pull request Aug 16, 2024
The OPTNONE changes in DLTcollab#638 introduced 25% performance regression in
Cycles render engine running on Apple Silicon:

  https://projects.blender.org/blender/blender/issues/126408

The reason for this is because as per Clang documentation optnone is
incompatible with inline, so none of the functions that is marked as
optnone is inlined. The biggest bottleneck for Cycles after that
change is _mm_mul_ps.

This change makes it so _mm_mul_ps is inlined and is no longer marked
as optnone. This solves the immediate performance regression, and the
correctness is verified using the sse2neon's test suit on M2 Ultra
and M3 Max, with various optimization levels (default, -O2, -O3).

Additionally, adding -Wstrict-aliasing flag does not introduce new
warnings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failures with newer versions of clang/gcc using O1 and O2
3 participants