You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Even tough JEP 438 states that both x64 and AArch64 architectures should benefit from new vector api, currently performance of simdjson-java on M1 mac is way worse than other parsers:
This may be due to the usage of 256 bit vectors, I have found an thread which states that:
on AArch64 NEON, the max hardware vector size is 128 bits. So for 256-bits, we are not able to intrinsify to use SIMD directly, which will fall back to Java implementation of those APIs
When running the benchmark with '-XX:+UnlockDiagnosticVMOptions', '-XX:+PrintIntrinsics' the following output can be observed, supporting this theory:
** not supported: arity=0 op=load vlen=32 etype=byte ismask=no
** not supported: arity=1 op=store vlen=32 etype=byte ismask=no
** not supported: arity=0 op=broadcast vlen=32 etype=byte ismask=0 bcast_mode=0
** not supported: arity=2 op=comp/0 vlen=32 etype=byte ismask=usestore
** not supported: arity=1 op=cast#438/3 vlen2=32 etype2=byte
** not supported: arity=0 op=broadcast vlen=32 etype=byte ismask=0 bcast_mode=0
** not supported: arity=2 op=comp/0 vlen=32 etype=byte ismask=usestore
** not supported: arity=1 op=cast#438/3 vlen2=32 etype2=byte
** not supported: arity=0 op=load vlen=32 etype=byte ismask=no
** not supported: arity=1 op=store vlen=32 etype=byte ismask=no
** not supported: arity=0 op=broadcast vlen=32 etype=byte ismask=0 bcast_mode=0
** not supported: arity=2 op=comp/0 vlen=32 etype=byte ismask=usestore
** not supported: arity=1 op=cast#438/3 vlen2=32 etype2=byte
** not supported: arity=0 op=broadcast vlen=32 etype=byte ismask=0 bcast_mode=0
** not supported: arity=2 op=comp/0 vlen=32 etype=byte ismask=usestore
** not supported: arity=1 op=cast#438/3 vlen2=32 etype2=byte
Obviously AArch64 support is not as important as x64, but it may be interesting to make the implementation flexible to support both architectures. Perhaps the C++ implementation can be used as a reference.
Anyway, great work so far on the Java port, the results on x64 are very impressive!
The text was updated successfully, but these errors were encountered:
It would tell us what the preferable vector length is for your machine. Unfortunately, it's challenging to just replace ByteVector.SPECIES_256 with ByteVector.SPECIES_PREFERRED in the library. In several places, we have to perform bitwise operations where it's easier to know the length upfront to avoid, for example, using extra masks to extract a relevant part of a long.
In my opinion, to support different vector lengths, we would need to provide dedicated implementations for each length and then, based on ByteVector.SPECIES_PREFERRED, pick the one that is the best for the machine the library is used on.
Even tough JEP 438 states that both x64 and AArch64 architectures should benefit from new vector api, currently performance of
simdjson-java
on M1 mac is way worse than other parsers:This may be due to the usage of 256 bit vectors, I have found an thread which states that:
When running the benchmark with
'-XX:+UnlockDiagnosticVMOptions', '-XX:+PrintIntrinsics'
the following output can be observed, supporting this theory:Obviously AArch64 support is not as important as x64, but it may be interesting to make the implementation flexible to support both architectures. Perhaps the C++ implementation can be used as a reference.
Anyway, great work so far on the Java port, the results on x64 are very impressive!
The text was updated successfully, but these errors were encountered: