Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FixedDecimal as a standalone crate #167

Closed
zbraniecki opened this issue Jul 6, 2020 · 10 comments · Fixed by #303
Closed

FixedDecimal as a standalone crate #167

zbraniecki opened this issue Jul 6, 2020 · 10 comments · Fixed by #303
Assignees
Labels
C-numbers Component: Numbers, units, currencies T-task Type: Tracking thread for a non-code task
Milestone

Comments

@zbraniecki
Copy link
Member

zbraniecki commented Jul 6, 2020

This topic is a spin-off from my review of num-util in #141.

While reviewing the code I started reading more about other Rust crates that are working with numbers and their conversions. In particular, parsing and stringification.

My understanding is that FixedDecimal is a representation of a number which the user can perform certain operations on, such as elevating/lowering precision, adjusting visible digits, rounding etc.
Such number can be then stringified using the new information.

A canonical example of such use case is ECMA402 NumberFormat where the user may want to do:

let nf = new Intl.NumberFormat(undefined, {
  minimumFractionDigits: 2,
});
nf.format(2.0); // "2.00"

And in Rust, FixedDecimal can be used for that purpose, like this:

// See https://github.com/unicode-org/icu4x/issues/167#issuecomment-654449715 for Shane's
// explanation how to use the API - the example here is *not* how he envisions it.

struct Number {
    integer: FixedDecimal,
    fraction: Option<FixedDecimal>
}

pub fn format<I: TryInto<Number>>(input: I, options: NumberFormatOptions) -> Result<String, _> {
    let mut num: Number = input.try_into()?;
    if let Some(fraction) = num.fraction {
        let shift = options.minfd - fraction.magnitude_range().end() - 1; // 2 - 0 - 1
        if shift > 0 {
            fraction.adjust_range(shift); // 1
        }
    }
    Ok(num.to_string())
}

This is of course oversimpified, uses allocation instead of Writer etc., but I tried to base this API on #141.

If this is, indeed, its main use case, then it seems to me that this structure is actually a fairly generic foundational number parsing/adjusting/serializing API that may be useful way beyond ICU4X.

In particular, as Shane pointed out in #166, some version of serialization of integers/floats is needed, and some version of parsing of integers/floats is needed.

The only thing beyond parsing/serializing is the concept of adjusting which digits are visible in both ways - by hiding (and rounding) digits, and by adding visible digits (such as trailing or leading 0s).

We may want to allow for parsing from String where 2.00 could be parsed to our example Number with information about two fraction digits, while it cannot be parsed to any float type without losing that information.
We may also want to allow for serialization to Writer/String which should preserve the trailing zeros, or even allow the user to adjust the visible fraction digits and serialize as 2.000 or 2.0.

Since those two operations require special structure to contain that information in an efficient way, and the usefulness of this utility goes beyond Internationalization domain, I'd like to put to consideration spawning this off of ICU4X as a standalone crate.

Such crate would provide FixedDecimal and maybe FixedNumber with parsing/serializing and rounding/visibility methods.

The benefits of such move would be:

  • Separation of concerns
  • Lower entry barrier for contributors
  • Rally contributors with experience in number operations, math and Rust
  • Much faster path to release and stability and in result feedback/usage
  • Test-bed for our release process before ICU4X gets to that level

The potential downsides I see:

  • Additional project management overhead from a separate repo/crate
  • We'll need to ensure that such API will be usable for PluralOperands conversion

@sffc @Manishearth others - thoughts?

@zbraniecki zbraniecki added the discuss Discuss at a future ICU4X-SC meeting label Jul 6, 2020
@sffc
Copy link
Member

sffc commented Jul 6, 2020

I already intend for icu-fixed-decimal to be its own component. Is the question whether to remove the icu- prefix and host this in its own repository?

I would rather keep the crate in ICU4X, at least for the time being. We should own FixedDecimal and make sure it is optimized for its primary purpose, which is as a data type for Intl.NumberFormat and Intl.PluralRules. When we approach v1, and the crate is largely stabilized, I'm open to the idea of spinning it off.

And in Rust, FixedDecimal can be used for that purpose, like this:

struct Number {
    integer: FixedDecimal,
    fraction: Option<FixedDecimal>
}

This Number struct is not consistent with my idea of FixedDecimal. A FixedDecimal is an integer iff its lower_magnitude is nonnegative. It does not make sense to split the integer and fraction parts into separate fields. FixedDecimal is already capable of representing both whole numbers and numbers with fractions.

What would be more consistent with my mental model would be functions like .integer_part() and .fraction_part() that return the integer or fraction slice of a FixedDecimal as a new FixedDecimal (or, if you prefer, FixedInteger), which can be done efficiently. However, the two parts should not be in the data model of Number.

@zbraniecki
Copy link
Member Author

zbraniecki commented Jul 6, 2020

I already intend for icu-fixed-decimal to be its own component. Is the question whether to remove the icu- prefix and host this in its own repository?

Mostly. Also, put it on its own release cadence, and announce it to the community and encourage contributions to it while we work on ICU4X's first release.

I would rather keep the crate in ICU4X, at least for the time being. We should own FixedDecimal and make sure it is optimized for its primary purpose, which is as a data type for Intl.NumberFormat and Intl.PluralRules. When we approach v1, and the crate is largely stabilized, I'm open to the idea of spinning it off.

Sounds good to me. Would it then make sense to at least name it independently of icu? It's not really an ICU package as much as it's a helper utility that ICU (and, I argue others) will use.
If we keep it in our repo, then maybe it's worth separating a top level ./utils on par with ./components and have ./utils/fixed-decimal there?

This way we would benefit from the ecosystem we have here, iterate on it to 1.0, but make it clear that this is not an ICU4X component.
I'd then also advocate for faster release - like, we could land what you have and immediately release 0.1 and get the community review and aid us in making it optimal.

FixedDecimal is already capable of representing both whole numbers and numbers with fractions.

I could not find a method that would allow me to implement minimum_fraction_digits: 2. Can you help me conjure such snippet?

What would be more consistent with my mental model

Ah, I see. Sure, I don't think this is of high priority. But I'd like to get to the point where we have a list of "most common" patterns and we have examples/tests/benchmarks that implement them.

This will make it easier to evaluate actual performance and ensure the API is aligned with the use cases.

Btw. I'm happy to take on the release management for such utility if you don't want to bother yourself with it. I didn't suggest it earlier because I thought that with your work on PMing ICU4X, you'll want to test ride the release process of a crate :)

@sffc
Copy link
Member

sffc commented Jul 6, 2020

I think we should not remove FixedDecimal from the ICU4X repository, because it is a fundamental data type for Intl.NumberFormat and Intl.PluralRules, and we want to ensure that any changes to not have adverse affects on the code size, memory, and performance benchmarks of ICU4X.

I already intend for icu-fixed-decimal to be its own component. Is the question whether to remove the icu- prefix and host this in its own repository?

Mostly. Also, put it on its own release cadence, and announce it to the community and encourage contributions to it while we work on ICU4X's first release.

I don't want to start encouraging community contributions until after we have our testing infrastructure fully set up with performance regression tests (#66, #107, #140).

Would it then make sense to at least name it independently of icu? It's not really an ICU package as much as it's a helper utility that ICU (and, I argue others) will use.

I'm not sure about this. Let me think about it a bit more.

If we keep it in our repo, then maybe it's worth separating a top level ./utils on par with ./components and have ./utils/fixed-decimal there?

If we decide to remove the icu- prefix, then a new top-level directory seems fine to me.

FixedDecimal is already capable of representing both whole numbers and numbers with fractions.

I could not find a method that would allow me to implement minimum_fraction_digits: 2. Can you help me conjure such snippet?

The FixedDecimal API surface is not finished yet, but if you wanted to represent a number with minimum_fraction_digits: 2, you can do,

let mut dec: FixedDecimal = 250.into();
dec.adjust_magnitude(-2);
dec.to_string();  // 2.50

@zbraniecki
Copy link
Member Author

I think we should not remove FixedDecimal from the ICU4X repository, because it is a fundamental data type for Intl.NumberFormat and Intl.PluralRules, and we want to ensure that any changes to not have adverse affects on the code size, memory, and performance benchmarks of ICU4X.

My argument is that its utility goes beyond ICU4X. It's fairly common in my experience for such cases to spin off a separate crate with a separate cadence and community.

I understand the reluctance, and the drive to ensure the minimal friction between its release cycle and ICU4X work, I just don't think it would create it.

But I'm ok keeping this within the repo for now.

The FixedDecimal API surface is not finished yet, but if you wanted to represent a number with minimum_fraction_digits: 2, you can do,

I see!

So, if I have 2 and I want to have 2.00 I need to multiply it by 100 and then adjust the magnitude by -2. I'll update my initial example!

@zbraniecki
Copy link
Member Author

let mut dec: FixedDecimal = 250.into();
dec.adjust_magnitude(-2);
dec.to_string();  // 2.50

Hmm, I'm struggling to see how it should be used for the use case I gave before.

If the example I gave at the top is not the main use case, or is one of many and the others are different, can you help me with more use cases of how FixedDecimal is used in ICU?

If the example I gave is the main use case, then I struggle a bit with the current API, as it seems I need to perform a number of operations on the number to get the 2.00 out of 2:

  • The PR doesn't seem to handle floats, so I can't start with 2.0 (which is ok for 2.00, but not for 2.01 with minfd=4 aiming for 2.0100)
  • There is no API to multiply FixedDecimal by 100 to get 200 out of 2 which seems to be required to then adjust_magnitde(-2) to get 2.00

I'm asking here instead of PR because I'm not sure what is and what is not in scope of FixedDecimal and depending on it I'll adjust my position on whether this should be a general use crate, or an internal util of ICU4X.

@Manishearth
Copy link
Member

we want to ensure that any changes to not have adverse affects on the code size, memory, and performance benchmarks of ICU4X.

I think we can do that either way, though, as long as we maintain control over the crate. But if we get a community of contributors it can get better.

I don't see a huge issue with perf benchmarks: if the crate regresses perf, we can fix that when we notice it in the version bump. This happens sometimes.

@sffc
Copy link
Member

sffc commented Jul 10, 2020

The nice part about hosting this in the ICU4X monorepo is that we can use the same robust testing infrastructure. We can identify regressions before they ever make it into the project. Saving the regression testing to routine dependency version bumps has disadvantages:

  • No signal as to what change in the dependency caused the regression
  • Upgrading the dependency is blocked until the regression is resolved
  • Fixing problem needs to be championed by the ICU4X team, instead of the original PR author

We've accepted SmallVec+SmallStr and TinyStr as external code dependencies so far. I would like to keep that list as small as possible. We can ship FixedDecimal as its own crate if we want, but if we keep development in ICU4X, we acknowledge that the primary reason FixedDecimal exists is to power ICU4X.

@Manishearth
Copy link
Member

Fixing problem needs to be championed by the ICU4X team, instead of the original PR author

I mean, the project would still be run by us, we'd be approving those PRs either way.

On the testing front, this actually encourages us to write more scoped FixedDecimal tests. If we do that I don't imagine regressions will sneak up the deptree often, and when they do they're not a big deal. I'm not imagining this crate to have a high volume, I just think it's a good thing to do for the community to have it be a separate repo that can be contributed to.

@zbraniecki
Copy link
Member Author

I'm with Manish here. I think Rust community handles such isolated crates/repos really well and since I see the value of FixedDecimal beyond ICU4X, I see it as a chance to attract contributors.

But since Shane is reluctant to such separation, I'm ok starting using icu4x repo and utils/* directory and still releasing it as a crate. I think the main downsides are an intimidating for new contributors set of Issues and PRs, but I hope it won't be a major blocker for anyone who wants to help us mature it :)

@sffc sffc added question Unresolved questions; type unclear C-numbers Component: Numbers, units, currencies labels Jul 16, 2020
@sffc sffc self-assigned this Jul 16, 2020
@sffc
Copy link
Member

sffc commented Jul 17, 2020

Tentative decisions from meeting on 2020-07-17:

  • Move FixedDecimal to /utils and release as a standalone fixed-decimal crate.

@sffc sffc added T-task Type: Tracking thread for a non-code task and removed question Unresolved questions; type unclear discuss Discuss at a future ICU4X-SC meeting labels Jul 17, 2020
@sffc sffc added this to the ICU4X 0.1 milestone Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-numbers Component: Numbers, units, currencies T-task Type: Tracking thread for a non-code task
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants