Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add direct serialization to MessageDigest hashing #271

Open
jasperpotts opened this issue Aug 9, 2024 · 3 comments
Open

Add direct serialization to MessageDigest hashing #271

jasperpotts opened this issue Aug 9, 2024 · 3 comments

Comments

@jasperpotts
Copy link
Member

Problem

Today we have to serialize to a byte array before we can hash a PBJ protobuf object. This is wasteful in time and more importantly heap object garbage. For example we do:

EventCore.PROTOBUF.toBytes(event.getEventCore()).writeTo(eventDigest)

Solution

Add an overloaded write(MessageDigest digest) method to Codec interface and generated implementations. It can call the normal write(T o, WritableSequentialData out) use a custom implementation of WritableSequentialData that wraps a java.security.MessageDigest mapping the write(byte) and write(byte[]) methods to update(byte) and update(byte[]) methods on MessageDigest. This will avoid the creation of temporary byte arrays and ByteArrayOutputStreams.

Alternatives

No response

@jasperpotts
Copy link
Member Author

After this is fixed, update PbjBytesHasher to use it.

@lpetrovic05
Copy link
Member

this is currently implemented with existing code:

private final MessageDigest eventDigest = DigestType.SHA_384.buildDigest();
private final WritableSequentialData eventStream = new WritableStreamingData(new HashingOutputStream(eventDigest));

EventCore.PROTOBUF.write(event.getEventCore(), eventStream);

Do you think there is any benefit in adding this new method? Other than some convenience?

@jasperpotts
Copy link
Member Author

Probably fine, don't think there is a performance impact from first look other than extra object created. Adding the PBJ method would make code cleaner and give single point to performance test and improve over time. We know that PBJ serialization is expensive and if there is a more efficient way to hash a PBJ object in the future then that would be nice if there was only a couple APIs where we know it is done and they are used everywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants