-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi-source data principle #5763
base: main
Are you sure you want to change the base?
Conversation
documents/design/principles.md
Outdated
@@ -58,6 +58,23 @@ ICU4C/ICU4J exposes certain pieces of data through user-facing APIs such as Date | |||
|
|||
Runtime customizability of locale data can sometimes come at a performance or memory cost. | |||
|
|||
## Locale data from multiple sources works seamlessly | |||
|
|||
*What:* If data is available for a particular constructor and locale, the resulting behavior should not change based on where the data was sourced, with a narrow exception for data that primarily impacts performance characteristics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this basically says that behaviour can never change even if CLDR data changes. If I'm sourcing from more recent CLDR, behaviour should often change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point; that obviously wasn't the intention. Is this better?
@@ -58,6 +58,23 @@ ICU4C/ICU4J exposes certain pieces of data through user-facing APIs such as Date | |||
|
|||
Runtime customizability of locale data can sometimes come at a performance or memory cost. | |||
|
|||
## Locale data from multiple sources works seamlessly | |||
|
|||
*What:* If data is available for a particular constructor and locale, the correctness of behavior should not change based on where the data was sourced, with a narrow exception for data that primarily impacts performance characteristics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same reaction as @robertbastian on reading this updated text so I think it still needs work. CLDR updates fix correctness all the time.
Perhaps just have an explicit exception for outdated/buggy data that was fixed in new data source versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should this be constructor, locale, and key attributes? My understanding was that we were open to having people filter out e.g. unit data based on key attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though this means that the "don't do this" example below could still be done by having a "common" and "extended" key attribute. That's a bit of a misuse of key attributes, maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue is that the word "source" is overloaded. It has long meant "the runtime data provider that ultimately reads data", such as a blob provider or bake provider. However, now we also have icu_provider_source
, which means "the data provider that reads from CLDR/ICU/LSTM". In this principle, I'm primarily referring to the first version: if you mix a baked, blob, and fs provider, which were built with different datagen settings, there should not be an observable difference in behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that was the ambiguity that was tripping me up, it is more that when you talk about "where the data is sourced", that can absolutely involve data that is outdated, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the main reasons to load data on demand will be in cases where data is more likely to become outdated.
In favor of the principle, think the wording needs more work. |
I have talked about this principle before, but I couldn't find it written down.
It came up in #5759.