Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert BigInt/Decimal to expected types for packed numbers #1391

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

jadams-tresys
Copy link
Contributor

Still have an edge case for packed xs:date/Time values require an actual BigInt, but this fixes DAFFODIL-2961

- add checks for signed/unsigned in BinaryIntegerBaseParser and BinaryIntegerBaseUnparser
- add tests for parse and unparse
- update failing tests

DAFFODIL-2297
- currently we don't check the minimum lengths for signed/unsigned binary ints and this causes issues when you try to set a binary int to 0 bits (RSDE), so per the DFDL workgroup we add in these checks.
- add width check to unparse code and associated test
- fix bug where we weren't returning after UnparseError
- add SDEs for checks for non-expression fixed lengths
- add tunable allowSignedIntegerLength1Bit and update tests to use
- add WarnID SignedBinaryIntegerLength1Bit
- SDW by default, SDE or UE/PE if tunable is false

DAFFODIL-2297
- currently we pass in isSigned boolean from ElementBaseGrammarMixin, but we can use the new PrimType.PrimNumeric.isSigned member instead for simplicity. We also added the minWidth member to simplify checking the minimum width of PrimNumeric types

DAFFODIL-2339
- move vals into conditional blocks that use them
- fix diagnostic message to be grammatically correct for unsigned
- make binaryIntegerValue lazy val
- add tests for coverage
- make isSigned a def instead of a val
- set minWidth for decimal to Nope
- fix minWidth bug for Byte type
- add comment for non PrimNumeric usage of PackedBinaryIntegerBaseParser
- make it a PE if nBits == 0 for PackedBinaryIntegerBaseParser

DAFFODIL-2297
- added test for parsing from -1 to 1 bit
- added checks for signed/unsigned minWidth to decimal with tests

DAFFODIL-2297
- added more decimal tests
- updated PE to be dynamic for signed and unsigned packed integers
- added PE for packed decimal with length 0

DAFFODIL-2297
… Integers

- move checkMinWidth and checkMaxWidth into BinaryNumberCheckWidth trait
- no SDW on parsing/unparsing 1 bit length signed numbers (just during compilation for bit length known at compile time)
- update tests
- add test for packed integer IVC

DAFFODIL-2297
</tdml:defineSchema>

<!-- Daffodil 2961 -->
<tdml:parserTestCase name="bcdBigDecimalError" root="bcdBigDec" model="bcdErr"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove "error"/"err" from the schema and test name. I think in general that implies it is a negative test case. Maybe use "expr" instead to make it clear that this is about using packed values in expressions?

case pn: PrimType.PrimNumeric => pn.fromNumber(toBigDecimal(num, scale))
// Non-numeric types such as Time can still use these funcitons and
// expect BigDecimal as the output of the conversion
case _ => toBigDecimal(num, scale)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like when we always use the packed integer parsers for dealing with date/times, so should this be an Assert.impossible() case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that is correct. Will change.

@@ -38,6 +40,24 @@ import passera.unsigned.ULong
trait PackedBinaryConversion {
def toBigInteger(num: Array[Byte]): JBigInteger
def toBigDecimal(num: Array[Byte], scale: Int): JBigDecimal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worth doing a little more refactoring. I say this because something that implements PackedBinaryConversion only ever uses one of toBigInteger/toInteger or toBigDecimal/toDecimal. So it feels a bit odd to require both be implemented. And potentially unsafe--an implementation could accidentally use the wrong function.

What if instead we remove PackedBinaryConversion altogether, and PackedBinaryDecimalBaseParser has the abstract definition of toBigDecimal, and PackedBinaryIntegerBaseParser has the abstract definition for toBigInteger. Then these new toInteger and toDecimal functions go and the logic just moved into the the appropriate parser trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is that we would still need to duplicate everything in the PackedBinaryDecimal/IntegerDelimitedBaseParsers as well. I think that was the reason for the conversion trait to begin with. The DelimitedBaseParsers do not share anything with the standard BaseParsers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PackedIntegerDelimitedParser only uses toBigInteger and PackedDecimalDelimitedParser only uses toBigDecimal, so these two classes do need an abstract toBigInteger/toBigDecimal respectively, but they shouldn't need both that the PackedBinaryConversion trait requires.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the delimited parsers need the same conversion code to avoid the error this bug was fixing where a value gets used in an expression?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, maybe right refactoring is more like this?

 // returns either BigDecimal or BigInteger depending on the implementation
def toBigNumber(num: Array[Byte]): Number 

def toPrimType(...) : DataValueNumber = {
  val bigNumber = toBigNumber
  context.optPrimType.get match {
    case pn: PrimType.PrimNumeric => pn.fromNumber(bigNmber)
    case => bigNumber
  }
}

Then all the individual parses just have to implement toBigNumber correctly, and the abstract base classes just call toPrimType. Might even be renaming it and changing it to also handle exceptions and call setDataValue to avoid more duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making sure that I'm on the same page. Is that not what I did below (although they are called toInter/Decimal instead of toPrimType)? I recognize that I need to add some exception handling for potential InvalidPrimitiveDataExceptions. I think it also makes more sense to me to keep the calls to actually set the value inside the parse functions instead of moving them inside the toPrimType trait function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main difference I'm suggesting is making the trait more generic so implementations don't have to implement both toBigInteger and toBigDecimal, they just implement a single toBigNumber and that will return the BigDecimal or BigInteger depending on if it's an decimal or integer type of parser. To ensure correct types, we could define it something like this:

trait PackedBinaryConversion[A <: Number] {
  def toNumber(num: Array[Byte]): A

  def toPrimType(...) = { ... toNumber(...)  }
}

And then usages would provide the right type parameter, for example:

abstract class PackedBinaryIntegerDelimitedBaseParser(...)
  ...
  with PackedBinaryConversion[JBigInteger]

...

abstract class PackedBinaryIntegerDelimitedDecimalParser(...)
  ...
  with PackedBinaryConversion[JBigDecimal]

Avoids duplicating the decimal/integer logic in the PackedBinaryConversion trait.

I think it also makes more sense to me to keep the calls to actually set the value inside the parse functions instead of moving them inside the toPrimType trait function.

I have no strong preference either way. You just have to duplicate error handling logic if it's not done in the shared trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a clean way to have a single toNumber function while also dealing with the scale parameter that is needed for Decimal? I thought about defaulting scale to 0 and treating scale=0 as a BigInt, but scale=0 is valid for BigDecimal as well.

You just have to duplicate error handling logic if it's not done in the shared trait.
If I have it in the trait I would then need to pass that current state in as well to create the Parse Error. Since the PackedBinaryConversion trait I wasn't sure if there was a way to pass in either a Parse state or an Unparse state that could create errors appropriately.

I'm pushing up some changes now that I think cover everything else pretty well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you drop the scale parameter and get it from the class member, e.g.

class BCDDecimalKnownLengthParser(..., binaryDecimalVirtualPoint: Int)
  extends ... {

  override def toNumber(num: Array[Byte]): JBigDecimal =
    DecimalUtils.bcdToBigDecimal(num, binaryDecimalVirtualPoint)
}

I think this actually makes is actually an improvement over what we have now, since the abstract class shouldn't need to know anything about how the implementations converts a byte array to a Number, all the information like scale, sign codes, etc. is private to the specific packed binary number implementation. And if we ever need some new parameter on how a byte array is converted to a Number it can be done specifically in the implementation and the base class doesn't have to change at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented it as you described now. It is much cleaner now I think. Unfortunately rebasing onto Lola's branch has really muddied up the diff when it comes to review.

@@ -26,6 +26,8 @@ import org.apache.daffodil.lib.equality.TypeEqual
import org.apache.daffodil.lib.exceptions.Assert
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably hold off until #1337 is merged, since that touches some of this code and is the bigger commit, easier to deal with conflicts with the smaller commit.

@@ -100,4 +100,7 @@ class TestPacked {
@Test def testDelimitedIBM4690DecSeqUnparser(): Unit = {
runner.runOneTest("DelimitedIBM4690DecSeqUnparser")
}

// Daffodil-2961
@Test def testBCDBigDecimalError(): Unit = { runner.runOneTest("bcdBigDecimalError") }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some out of range tests as well? That's a new limitation that this will enforce that I think wasn't previously enforced.

olabusayoT and others added 8 commits December 13, 2024 14:24
… Binary Integers

- add tests for 0 bit length and multiple of 4 check
- add runtime multiple of 4 length check

Deprecation/Compatibility
Although still supported via the tunable allowSignedIntegerLength1Bit, it is recommended that schemas be updated to not depend on 1-bit signed binary integers

DAFFODIL-2297
- add test for verification

DAFFODIL-1842
Bumps fedora from 40 to 41.

---
updated-dependencies:
- dependency-name: fedora
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Also removes the "dnf update nodejs". That was needed because the pinned
version of nodejs in Fedora 40 was broken. Fedora 41 ships with a newer
version of nodejs that includes a fix so this update is no longer
needed.

Signed-off-by: dependabot[bot] <[email protected]>
Tests show that parse of negative representations (from packed decimal reps) does not detect the error when decimalSigned="no".

Test aalso show that unparse of negative values into binary, packed, and bcd all are broken.

This doesn't fix the issue. Just adds tests to illustrate the
failing cases.

DAFFODIL-2957
DAFFODIL-2797
Moved TestUtils and FuzzData in core to src/main
so FuzzData can be used outside of daffodil proper.

Removed unused and trivial methods from TestUtils.
Made private all methods possible. We're left with
minimum footprint of public methods.

Removed dependency on JUnit from TestUtils and FuzzData
so they don't have to be in src/test.

Tested for and removed a couple of TODOs that are no longer
needed. Reducing the number of TODOs and FIXMEs is
DAFFODIL-1569.

DAFFODIL-2959, DAFFODIL-1569
Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
I don't have much to add beyond finding one typo.


<!-- Daffodil 2961 -->
<tdml:parserTestCase name="bcdBigDecimalError" root="bcdBigDec" model="bcdErr"
description="Verify that usign packed values in an expression work correctly">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usign => unsigned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants