Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3912: Fix Big decimal deser #2599

Merged
merged 10 commits into from
Dec 4, 2023

Conversation

clesaec
Copy link
Contributor

@clesaec clesaec commented Nov 28, 2023

What is the purpose of the change

AVRO-3912 : fix big decimal deserialisation error (thanks @Ten0 for comment on this PR).

Verifying this change

Unit test from writer to reader in decimal.rs source file

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@clesaec clesaec requested a review from martin-g November 28, 2023 09:43
@github-actions github-actions bot added the Rust label Nov 28, 2023
@Ten0
Copy link
Contributor

Ten0 commented Nov 28, 2023

I thought the error was on serialization, not deserialization

@clesaec
Copy link
Contributor Author

clesaec commented Nov 28, 2023

@Ten0 : I made serialization the same way i made it on Java, to keep capabilities to serialize in Java or Rust and deserialize in Java or Rust. So, i found better to fix deserialization in Rust than to review Java & rust module.

@Ten0
Copy link
Contributor

Ten0 commented Nov 28, 2023

If you skip serializing the bytes length and you still have the scale after the first buffer then that means the BigDecimal logical type cannot be deserialized as bytes (because you'd have an extra int after), which looks like it's incorrect because it's only a logical type and not a full new type. (One would expect from a logical type that if it's unknown by their library they can still deserialize it as the main type and either ignore it or interprete it manually.)
Also IIUC the Java implementation starts by deserializing as bytes and then converts that into a BigDecimal, so we do have the two lengths.

@@ -214,4 +218,63 @@ mod tests {

Ok(())
}

#[test]
fn other() -> TestResult {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give the method a better name, e.g. test_avro_3912_...

let val = BigDecimal::new(BigInt::from(12), 2);
record.put("field_name", val.clone());

// save record in writer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// save record in writer
// write a record

.build();

let result = writer.append(record.clone());
assert!(result.is_ok(), "Append KO : {}", result.unwrap_err());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert!(result.is_ok(), "Append KO : {}", result.unwrap_err());
assert!(result.is_ok(), "Append NOK : {result}");


let result = writer.append(record.clone());
assert!(result.is_ok(), "Append KO : {}", result.unwrap_err());
assert!(writer.flush().is_ok(), "Flush KO");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there is no need to assert!() these above. You could just use ?, e.g. writer.append(record.clone())?; and it will propagate the error because the method returns TestResult.

I try to avoid using .unwrap() in general. Especially in non-test code its usage should be forbidden. I am not sure whether there is a Clippy check that we could enforce.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-g
Copy link
Member

Let's add an interop test (or extend the current one) for BigDecimal in Rust and Java!

@clesaec
Copy link
Contributor Author

clesaec commented Nov 28, 2023

Get it, i didn't realize that a logical type has to be deserializable in its main type; i do the fix.

@clesaec clesaec marked this pull request as draft November 28, 2023 14:37
@Ten0
Copy link
Contributor

Ten0 commented Nov 29, 2023

By the way if the scale was written at the beginning of the Bytes instead of the end then you wouldn't need to serialize an extra length, because the scale encoded like other ints is self-delimited so then the rest of the Bytes would be the big endian encoded part. That would be a more efficient storage, more in the spirit of the rest of the format which is to store as little amount of metadata as possible. Maybe it isn't too late to change that since it's not released?

@clesaec clesaec marked this pull request as ready for review November 29, 2023 14:21
@clesaec
Copy link
Contributor Author

clesaec commented Nov 29, 2023

I add test with file generated with Java module and fix code comments.

To with scale to avoid serializing an extra length is a good idea, but, Java decoder interface miss a function like "readRemainder()" or something similar; because we don't know the number of byte took by scale value with readInt() (can be one or several byte, depends on the value itself).

@Ten0
Copy link
Contributor

Ten0 commented Nov 29, 2023

Java decoder interface miss a function like "readRemainder()" or something similar

Isn't .remaining() it?

public BigDecimal fromBytes(ByteBuffer value, Schema schema, LogicalType type) {
int scale = ((LogicalTypes.Decimal) type).getScale();
// always copy the bytes out because BigInteger has no offset/length ctor
byte[] bytes = new byte[value.remaining()];

@clesaec
Copy link
Contributor Author

clesaec commented Nov 29, 2023

.remaining() method apply on ByteBuffer value variable, while decoder class use its own "pos" field to "iterate" on data. So, it would need as "byte[] decoder.readRemaining()" method.

@clesaec clesaec requested a review from martin-g November 30, 2023 08:09
Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
@@ -113,13 +113,17 @@ impl<T: AsRef<[u8]>> From<T> for Decimal {
}

pub(crate) fn serialize_big_decimal(decimal: &BigDecimal) -> Vec<u8> {
// encode big decimal, without global size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global size sounds too ... global.
How about total size ?

…bigdecimal.rs)

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
@Ten0
Copy link
Contributor

Ten0 commented Dec 1, 2023

.remaining() method apply on ByteBuffer value variable, while decoder class use its own "pos" field to "iterate" on data. So, it would need as "byte[] decoder.readRemaining()" method.

Alternately you could make getPos public:


and then parse value[decoder.getPos()..].
Either way that looks simple enough.

@clesaec
Copy link
Contributor Author

clesaec commented Dec 1, 2023

make getPos public => Well, pos is an implementattion detail of BinaryDecorder, that may should not be public.
The need is here, but i think it needs some works and discussion on Java side.
In mean time, can this be acceptable as this ?

@clesaec
Copy link
Contributor Author

clesaec commented Dec 4, 2023

I first merge this and report to 1.11 branch to align Rust & Java BigDecimal logical type.
In a second time, i'll propose 2 PR in main branch to optimize it in main branch; then report in 1.11

@clesaec clesaec merged commit 9c4103c into apache:main Dec 4, 2023
15 checks passed
clesaec added a commit to clesaec/avro that referenced this pull request Dec 4, 2023
* AVRO-3912: Fix Big decimal deser
Co-authored-by: Martin Tzvetanov Grigorov <[email protected]>
martin-g pushed a commit that referenced this pull request Dec 4, 2023
* AVRO-3912: Fix Big decimal deser (#2599)

* AVRO-3912: Fix Big decimal deser
Co-authored-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3912: error comment
RanbirK pushed a commit to RanbirK/avro that referenced this pull request May 13, 2024
* AVRO-3912: Fix Big decimal deser
Co-authored-by: Martin Tzvetanov Grigorov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants