-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid boxing in Framing #1247
Avoid boxing in Framing #1247
Conversation
actor/src/main/scala-2.12/org/apache/pekko/util/ByteString.scala
Outdated
Show resolved
Hide resolved
The byte searching can do with SIMD too, but I lack of time to do this:( |
actor/src/main/scala-2.13/org/apache/pekko/util/ByteString.scala
Outdated
Show resolved
Hide resolved
actor/src/main/scala-2.13/org/apache/pekko/util/ByteString.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks
@Roiocam @jxnu-liguobin @jrudolph Would you like to get a look into this? |
else { | ||
var found = -1 | ||
var i = math.max(from, 0) | ||
while (i < length && found == -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we swap with found == -1 && i < length)
which can reduce a bit when we just found:)
if (bytes(startIndex + i) == elem) found = i | ||
i += 1 | ||
} | ||
found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we extract the startIndex + 1
to the start of the loop and returning a found - startIndex
?
I think we can delegate the I have checked the bytecode, will do this after work. |
Unfortunately, I don't think that would work, as the signature requires Code example: case class MyByteSeq(data: List[Byte]) {
def fastIndexOf(byte: Byte): Int = data.indexOf(byte)
//def indexOf1[B >: Byte](x: B): Int = fastIndexOf(Byte.unbox(x)) // This won't compile
def indexOf2[B >: Byte](x: B): Int = fastIndexOf(Byte.unbox(x.asInstanceOf[Object])) // This can fail at runtime
def indexOf3[B >: Byte](x: B): Int = x match {
case b: Byte => fastIndexOf(b) // I think the pattern match already unboxes it
case _ => ??? // I can't call fastIndexOf here, so I need a duplicated version of the code anyway
}
} |
THANKS, But I checked the bytecode, it was Java.lang.Objrct, not sure why that would fail at Runtime. |
@JD557 do you have any further improvement on this pr ?I would like to merge this and improvement can come up later. |
Not really, feel free to merge this and improve later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That still feels like a footgun, since it is quite unclear in which situations the new overload will be selected? Should we document in which case the static overload resolution will choose the new method?
Is it somehow possible to avoid the massive code duplication here (probably hard since the whole existing scheme is built around the function of Any.==
)? If not it should be documented.
Ultimately, the main problem of ByteString
has always been the attempt to make it fit seamlessly into the rest of the Scala collections by making it part of the collections type hierarchy (instead of making it fast and useful in the first place and then consider useful conversions/views to the Scala collection types).
I don't have any strong feelings about the naming. I used this scheme because it's the same that's used by ByteIterator (See: pekko/actor/src/main/scala-3/org/apache/pekko/util/ByteIterator.scala Lines 498 to 502 in 3cd0801
If we change the method here (e.g. to However, on that note:
Looks like (Actually, I think the equality on that |
So, I was doing some more tests with a specialized version of As such, I'm not sure how to proceed with this PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
@JD557 I think if you want to change the method name, you can using the name of firstIndexOf. And for the api, because it's using the scala collection's indexOf,so the input B where is actually a Byte, but now can be a any. As for selection, I think the compiler will choose this new method when it know the elem for testing is a type Byte but will not when It doesn't. As for the old index Of, which will always return -1 us the elem is not a Number/can boxed to Byte, so a quick check and delegate to the specified one will not harm too much. Who will using a ByteString to indexOf a any? |
In BoxedRuntime.java public static byte unboxToByte(Object b) {
return b == null ? 0 : ((java.lang.Byte)b).byteValue();
}
public static java.lang.Byte boxToByte(byte b) {
return java.lang.Byte.valueOf(b);
} In Byte.java public boolean equals(Object obj) {
if (obj instanceof Byte) {
return value == ((Byte)obj).byteValue();
}
return false;
} Update: BoxedRuntime.toByte will not works too. I think there are some implicitly conversion, because |
I don't think that Say one asks Ideally we would check if the number is between (There's also the annoying possibility that someone creates their Although I imagine most problems would come from |
Yes, @JD557 , I just checked again, it's using the BALOAD
INVOKESTATIC scala/runtime/BoxesRunTime.boxToByte (B)Ljava/lang/Byte;
ALOAD 1
INVOKESTATIC scala/runtime/BoxesRunTime.equals (Ljava/lang/Object;Ljava/lang/Object;)Z
IFEQ L6 |
@som-snytt Friendly ping , do you know any great way to handle this ,thanks. |
Probably, yes. If the general overload would not exist, the new one would also work for literals like |
@JD557 need another update to make mima happy |
I can try to fix it, but out of curiosity, isn't it OK to have MiMa issues, since this only targets 1.1.x? I was ignoring the issue because I thought 1.1.0 was going to break bincompat with 1.0.x anyway |
Pekko core follows SemVer, so the only acceptable MiMa issues are for internal code (i.e.
No it doesn't, that would be for Pekko 2.x.x. |
@@ -823,7 +874,33 @@ sealed abstract class ByteString | |||
override def indexWhere(p: Byte => Boolean, from: Int): Int = iterator.indexWhere(p, from) | |||
|
|||
// optimized in subclasses | |||
override def indexOf[B >: Byte](elem: B, from: Int): Int = indexOf(elem, from) | |||
override def indexOf[B >: Byte](elem: B, from: Int): Int = super.indexOf(elem, from) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am aware that this is override is a bit weird, but:
- I think the old code was an infinite loop
- Removing this override breaks the MiMa checks
This should never be called anyway, but I think having it like this is a bit safer.
OK, I think the MiMA issues should be fixed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@jrudolph do you still think this PR change needs more work? |
@sirthias Would you like to give some input about this too? |
The only comment I have is that there would be a potentially much more efficient implementation of |
@sirthias Yes, I opened up an issue for SIMD in #1264 , would be nice to have that after this been merged, I think https://github.com/sirthias/borer must have already done this. |
Yes, there is a (much more complicated) SWAR loop implemented in borer's JSON parser. But the whole thing only makes sense if we can really get down to raw byte access via |
@He-Pin @mdedetrich @Roiocam @samueleresca @raboof @jxnu-liguobin should we get this sorted out before 1.1.0-M1 RC or should we look at this again after the 1.1.0-M1 release? |
Sorry for now response, holidays here. I think we can including this in m1 and try with SWAR after that. |
changes have been made but I will leave the PR open a few days just in case
Merged - thanks @JD557 |
Adds a specialized
indexOf(Byte, Int)
to allByteString
implementations, similar to what is done on theByteIterator
, in order to avoid boxing/unboxing when searching for a byte.This speeds up framing quite a bit.
Benchmark results (Java 11, Scala 2.13, Apple M3 Max):
Old
New
Both (Interleaved)