Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-source data principle #5763

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Add multi-source data principle #5763

wants to merge 2 commits into from

Conversation

sffc
Copy link
Member

@sffc sffc commented Nov 1, 2024

I have talked about this principle before, but I couldn't find it written down.

It came up in #5759.

@sffc sffc requested a review from a team as a code owner November 1, 2024 23:40
@@ -58,6 +58,23 @@ ICU4C/ICU4J exposes certain pieces of data through user-facing APIs such as Date

Runtime customizability of locale data can sometimes come at a performance or memory cost.

## Locale data from multiple sources works seamlessly

*What:* If data is available for a particular constructor and locale, the resulting behavior should not change based on where the data was sourced, with a narrow exception for data that primarily impacts performance characteristics.
Copy link
Member

@robertbastian robertbastian Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this basically says that behaviour can never change even if CLDR data changes. If I'm sourcing from more recent CLDR, behaviour should often change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point; that obviously wasn't the intention. Is this better?

@@ -58,6 +58,23 @@ ICU4C/ICU4J exposes certain pieces of data through user-facing APIs such as Date

Runtime customizability of locale data can sometimes come at a performance or memory cost.

## Locale data from multiple sources works seamlessly

*What:* If data is available for a particular constructor and locale, the correctness of behavior should not change based on where the data was sourced, with a narrow exception for data that primarily impacts performance characteristics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same reaction as @robertbastian on reading this updated text so I think it still needs work. CLDR updates fix correctness all the time.

Perhaps just have an explicit exception for outdated/buggy data that was fixed in new data source versions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should this be constructor, locale, and key attributes? My understanding was that we were open to having people filter out e.g. unit data based on key attributes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though this means that the "don't do this" example below could still be done by having a "common" and "extended" key attribute. That's a bit of a misuse of key attributes, maybe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue is that the word "source" is overloaded. It has long meant "the runtime data provider that ultimately reads data", such as a blob provider or bake provider. However, now we also have icu_provider_source, which means "the data provider that reads from CLDR/ICU/LSTM". In this principle, I'm primarily referring to the first version: if you mix a baked, blob, and fs provider, which were built with different datagen settings, there should not be an observable difference in behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that was the ambiguity that was tripping me up, it is more that when you talk about "where the data is sourced", that can absolutely involve data that is outdated, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the main reasons to load data on demand will be in cases where data is more likely to become outdated.

@Manishearth
Copy link
Member

In favor of the principle, think the wording needs more work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants