Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boxed primitives for value classes in structures #2738

Open
Chuckame opened this issue Jul 3, 2024 · 5 comments
Open

Boxed primitives for value classes in structures #2738

Chuckame opened this issue Jul 3, 2024 · 5 comments

Comments

@Chuckame
Copy link
Contributor

Chuckame commented Jul 3, 2024

Describe the bug
Given a serializable data class having a value class wrapping a primitive type, the field is serialized using .encodeSerializableElement(serialDesc, 0, (SerializationStrategy)myData.$serializer.INSTANCE, myData.box-impl(self.value)) instead of encodeInlineElement(serialDesc, 0).encodeInt(self.value), forcing primitive values to be boxed, and also doesn't follow the expected as said in the docs:

    /**
     * Namely, for the `@Serializable @JvmInline value class MyInt(val my: Int)`,
     * and `@Serializable class MyData(val myInt: MyInt)` the following sequence is used:
     * ```
     * thisEncoder.encodeInlineElement(MyData.serializer.descriptor, 0).encodeInt(my)
     * ```
     *
     * This method provides an opportunity for the optimization to avoid boxing of a carried value [...]
     * ```
     */
    public fun encodeInlineElement(
        descriptor: SerialDescriptor,
        index: Int
    ): Encoder

To Reproduce
Check the bytecode generated from this code:

    @Serializable
    data class Structure(
        val primitiveField: BooleanValue,
    )

    @Serializable
    @JvmInline
    value class BooleanValue(val value: Boolean)

And here is the generated serializer (only the interesting part) retrieved using Kotlin bytecode IDEA tooling:

output.encodeSerializableElement(serialDesc, 0, (SerializationStrategy)ReproduceBug.BooleanValue.$serializer.INSTANCE, ReproduceBug.BooleanValue.box-impl(self.primitiveField));

Expected behavior
Should call encodeInlineElement and generate the following (not sure for the nullable value class' field):

output.encodeInlineElement(serialDesc, 0, (SerializationStrategy)ReproduceBug.BooleanValue.$serializer.INSTANCE).encodeBoolean(self.primitiveField);

Environment

  • Kotlin version: 1.9.22 & 2.0.0 & 2.0.20
  • Library version: 1.7.0+ (not tested with previous versions)
  • Kotlin platforms: jvm (and surely the other platforms, but not tested)
  • Gradle version: 8.7

EDIT: shortened the description

@Chuckame Chuckame changed the title Value classes not using encodeInlineElement and inefficient boxing of primitive values Boxed primitives for value classes in structures Sep 14, 2024
@Chuckame
Copy link
Contributor Author

Hello there, I tried to understand where could be the issue, but the plugin code is quite complex.

I think the issue is that the value class is not handled as an inlined element. I suppose the issue is around here https://github.com/JetBrains/kotlin/blob/f9de685b2feebf5e7021b7e769ae5751cde7dc46/plugins/kotlinx-serialization/kotlinx-serialization.backend/src/org/jetbrains/kotlinx/serialization/compiler/backend/ir/SerializerIrGenerator.kt#L266

Do you have any idea on how to fix it ?

@sandwwraith
Copy link
Member

Yes, I've re-checked the code and it is indeed is not generated. Likely you can change functions being called in the formEncodeDecodePropertyCall or serializeAllProperties.

@JakeWharton
Copy link
Contributor

We also experienced this performance problem. It gets even worse if you use value classes around other value classes (such as an Id wrapping a UInt) as the boxing seems to become exponential.

The idea that the library should unconditionally unwrap value classes is surprising, though. Is that only done if there is no custom serializer? What if the type is in another compilation unit and its serializer could change without recompilation?

I was coming to ask for some kind of opt-in to the unwrapping behavior when I discovered this issue.

For now we've migrated away from unsigned types and manually unwrap/re-wrap inside the carrier types.

e.g.,

@Poko
@Serializable
public class Create private constructor(
  @SerialName("id")
  private val _id: Int,
  @SerialName("tag")
  private val _tag: Int,
) : Change {
  override val id: Id get() = Id(_id)
  public val tag: WidgetTag get() = WidgetTag(_tag)

  public companion object {
    public operator fun invoke(
      id: Id,
      tag: WidgetTag,
    ): Create = Create(id.value, tag.value)
  }
}

This still affects toString() and discoverability of invoke is abysmal, but for us no users interact with this layer so it's an acceptable trade-off.

@Chuckame
Copy link
Contributor Author

Chuckame commented Sep 26, 2024

@JakeWharton custom serializer = on your own, so you are the owner of how to serialize, using directly encodeInt(42), or using object serializers like Int.serializer().serialize(42) which induce auto-boxing.

[...] its serializer could change without recompilation?

You missed the purpose of kotlinx-serialization plug-in when annotating with @Serializable: generate the according serializers code. So if no recompilation, no change in the generated serializers.

But still, it's interesting that you experienced big performance issues and found a "solution".


Anyway, thanks @sandwwraith, that could be an interesting day to dive a bit in the serialization plugin, I'll try to play with formEncodeDecodePropertyCall or serializeAllProperties as you said.

@JakeWharton
Copy link
Contributor

JakeWharton commented Sep 27, 2024

There's so much to unpack in your comment.

Custom serializers do not mean you are "on your own". Switching to and/or from a custom serializer on the type definition from a separate compilation unit should absolutely change the behavior of callers without recompilation. This fact should immediately preclude the strategy of the plugin automatically unwrapping on its own.

Moreover, the backing property is not necessarily public nor would its type necessarily be intrinsically representable in the serialization format. Removing the ability for a value class to dictate its own serialized form despite being required to be annotated as @Serializable is a ridiculous deviation in ones mental model on the behavior of the plugin-generated code.

But sure, let's say we want to completely ignore this fact and split the behavior of the library for value classes. There are so many implications to this:

  1. Why am I required to annotate the value class with @Serializable if it is going to be unwrapped?
  2. Migrating an identity class within my compilation unit to a value class can silently cause changes in the serialized form.
  3. Migrating an identity class in a library to a value class will silently change the serialized form only when downstream compilation units recompile. This means a runtime classpath update of the library will not cause a change, but a recompilation will. Yuck.
  4. The behavior of stdlib value classes such as UInt explicitly violate this behavior. They need changed to serialize the signed representation of the two's compliment bits rather than as unsigned bits. And if you argue that UInt should not behave that way, you're now not only arguing that we should split traditional classes vs. value classes but also that there's a carve-out of a subset of the stdlib's value classes to make them serialize like regular classes and that mechanism is not available for your types.

The behavior also completely breaks with the presumed forthcoming concept of multi-property value classes. You could possibly try to argue that the behavior should only be for @JvmInline-annotated value classes, but then that means the serialization behavior changes depending on which compiler backend you are using for the current compilation which is the worst of all worlds.

The library absolutely needs the ability to serialize value classes, but there are multiple axes to the behavior which are not being adequately handled today.

First, the unwrapping of a value class to directly serialize a single, public backing property should be something the enclosing type can choose to do on a per-property basis and it should only work on value classes with a single, public backing property without it needing to be marked as @Serializable.

Second, value classes should be able to define their own serialized form using the normal @Serializable annotation and its features. The default can even be to unwrap to the backing property (assuming there's only one) and will now work if the backing property is not public.

Both of these behaviors would allow custom value class representation without the enclosing class to do any work. It will also allow multi-property value classes to seamlessly integrate into the system with the only new concept being that its default representation is an object with multiple key-value pairs rather than unwrapped to its single-property scalar value.

Additionally, these two behaviors also retain the current behavior of the library making it a compatible migration, unlike the current proposed solution to this bug to skip the serializer.

Finally, as a bonus, and the superior solution to this bug in my opinion: it would be nice if serialization was specialized to the primitive storage types to avoid boxing in the case of delegating to the value class's serializer (similar to how the stdlib specializes things like iterators for primitives). I think it's unclear if this is viable strategy long-term, but it definitely solves the case of user value classes wrapping Int, UInt wrapping Int, and user value classes wrapping UInt.

At this point, this should probably be its own bug...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants