-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Datagen API LocaleExpansionMode #4629
Comments
@robertbastian Another API proposal. pub enum LocaleExpansionMode {
/// Locales always removed if identical to their parent
Deduplicated,
/// Locales always retained if explicitly requested or supported
Exhaustive,
/// Locales removed if identical to their parent, but not if the parent is `und`
Hybrid,
/// Only explicitly requested locales are retained
Preresolved,
}
pub enum RuntimeFallbackLocation {
Internal,
External,
}
impl DatagenDriver {
pub fn with_locale_expansion_mode(...)
pub fn with_runtime_fallback_location(...)
} Note: "hybrid" is repurposed. These are two separate flags with an interdependent matrix of defaults.
LEM = LocaleExpansionMode I highlighted cells that might be interesting. Obviously if there is an explicit RFL, use it (or err if not allowed by SBIF). A nice part about this design is that the flat enums lend themselves well to CLI. |
I think this is also ok, however it looks harder to reach in an incremental change in 1.5. I also think the defaults table adds a lot of complexity that clients will have to understand. |
We can add the two new functions to DatagenDriver and deprecate the old ones. If both the old functions and new functions are present, we return an error. |
@jedel1043 Thoughts on #4629 (comment)? |
Also note @robertbastian that this design permits baked data to be built with internal fallback in Preresolved and Exhaustive (formerly Hybrid) modes, which is not currently possible. |
I'd phrase this all more constructively: pub enum LocaleInclusionMode {
/// All requested locales, their ancestors, and their descendants are included, unless
/// they fall back to a locale with identical data.
///
/// Use this for minimal data size.
Deduplicated,
/// All requested locales, their ancestors, and their descendants are included, unless
/// they fall back to a *non-root* locale with identical data.
///
/// Use this if you need to determine at runtime if a base language has been
/// included.
DeduplicatedVariants,
/// All requested locales, their ancestors, and their descendants are included.
/// Unsupported locales will use datagen-time fallback to supported locales (or `und`).
///
/// Use this mode if you want your data to work both without and with runtime fallback.
Preresolved,
/// Only explicitly requested locales, as well as `und`, are included.
/// Unsupported locales will use datagen-time fallback to supported locales (or `und`).
///
/// Use this mode if you are not doing runtime fallback and only need these exact locales
/// to work.
ExplicitPreresolved,
} |
That API is also fine. Documentation wise, we could just tell users to not use |
Naming bikeshed, with some help from Gemini:
How about:
|
As far as functionality goes, I can still see the desire to add modes that include ancestors but not descendants, although @robertbastian pointed out that they can be modeled by locale variants instead. This is not relevant to the mode currently called Preresolved but it applies to the other three. The situation: currently |
How about
Or maybe we just go back to toggle switches
In tabular form,
|
This sounds nice, since the first alternative would make it increasingly harder to add more fallback options thanks to the exponential growth of fallback types. Though, it could also be a bit unnecessary if we don't expect to have any more fallback options. |
New draft: #[non_exhaustive]
pub enum RuntimeFallbackLocation {
Internal,
External,
}
#[non_exhaustive]
pub enum DeduplicationStrategy {
Maximal,
#[default]
RetainBaseLanguages,
NoDeduplication,
}
#[non_exhaustive]
enum Locales {
WithFallback(LocalesWithFallback),
WithoutFallback(LocalesWithoutFallback),
}
#[non_exhaustive]
pub struct LocalesWithFallback {
pub runtime_fallback_location: Option<RuntimeFallbackLocation>,
pub deduplication_strategy: DeduplicationStrategy,
/// private: set via constructor
locales: Vec<LanguageWithExpansion>,
}
impl LocalesWithFallback {
pub fn for_locales(impl IntoIterator<Item = LanguageWithExpansion>) -> Self;
}
#[non_exhaustive]
pub struct LocalesWithoutFallback {
pub locales: Vec<LanguageIdentifier>,
}
impl FromIterator<Item = LanguageIdentifier> for LocalesWithoutFallback {}
/// defaults to LocaleWithExpansion::with_variants
impl FromIterator<Item = LanguageIdentifier> for LocalesWithFallback {}
impl FromIterator<Item = LanguageWithExpansion> for LocalesWithFallback {}
#[derive(Debug, PartialEq, Clone, Hash)]
struct LanguageWithExpansion {
langid: LanguageIdentifier,
include_ancestors: bool,
include_descendants: bool,
}
/// agrees with strings
impl From<LanguageIdentifier> for LanguageWithExpansion {}
impl FromStr for LanguageExpansion {}
impl Display for LanguageExpansion
impl LanguageWithExpansion {
// en-US
pub fn with_variants(L) -> Self {
Self { langid, true, true }
}
// ^en-US
pub fn without_variants(L) -> Self {
Self { langid, true, false }
}
// @en-US
pub fn preresolved(L) -> Self {
Self {
langid, false, false
}
}
}
// resolve this against the exporter like
foo.runtime_fallback_location.unwrap_or_else(||
if exporter.supports_runtime_fallback() {
RuntimeFallbackLocation::Internal
} else {
RuntimeFallbackLocation::External
}
) Example call site: driver2.set_locales(Locales::WithFallback({
let mut x = LocalesWithFallback::for_locales([
LanguageWithExpansion::without_variants(langid!("de")),
langid!("en").into(),
]);
x.deduplication_strategy = DeduplicationStrategy::Maximal;
x
})).export(...); CLI: icu4x-datagen --keys all --locales en-US,de-AT --without-fallback
icu4x-datagen --keys all --locales ^en,de-AT --fallback-runtime-location --fallback-deduplication-strategy
# Language expansion (@, ^) requires fallback LGTM: @sffc @robertbastian |
I'd still like pub enum LocaleFamily {
Maximal(LanguageIdentifier),
Minimal(LanguageIdentifier),
Preresolved(LanguageIdentifier),
} |
Suggestion from @robertbastian: in the output, print clearly that how the DeduplicationStrategy and RuntimeFallbackLocation are being determined. Make it a API: log::warn!("Set runtime_fallback_location = {:?} due to the supplied exporter", ...runtime_fallback_location);
log::warn!("Set deduplication_strategy = {:?} due to runtime_fallback_location", ...deduplication_strategy); CLI: log::warn!("Set --runtime-fallback-location={} due to --format=mod", args.runtime_fallback_location);
log::warn!("Set --deduplication-strategy={} due to --runtime-fallback-location", args.deduplication_strategy); Example CLI invocation:
Error CLI invocations:
LGTM: @robertbastian @sffc |
Current: pub struct LocaleWithExpansion {
langid: LanguageIdentifier,
include_ancestors: bool,
include_descendants: bool,
}
impl LocaleWithExpansion {
/// en-US
pub fn with_variants(langid: LanguageIdentifier) -> Self
/// ^en-US
pub fn without_variants(langid: LanguageIdentifier) -> Self
/// @en-US
pub fn preresolved(langid: LanguageIdentifier) -> Self
}
impl LocalesWithFallback {
pub fn for_all_locales() -> Self
} Proposed: #[derive(Default, Copy, Clone)]
#[non_exhaustive]
pub struct FallbackOptions {
/// If not set, determined by `exporter`
pub runtime_fallback_location: Option<...>,
/// If not set, determined by `runtime_fallback_location`
pub deduplication_strategy: Option<...>,
}
#[derive(Default, Copy, Clone)]
#[non_exhaustive]
pub struct NoFallbackOptions {}
pub struct LocaleFamily {
langid: LanguageIdentifier,
include_ancestors: bool,
include_descendants: bool,
}
impl LocaleFamily {
/// en-US
///
/// For example, the family ::with_descendants("en-001") contains "und", "en" (ancestors), "en-001", "en-GB", "en-ZA" (descendants)
pub fn with_descendants(langid: LanguageIdentifier) -> Self
/// ^en-US
///
/// For example, the family ::without_descendants("en-001") contains "und", "en" (ancestors), "en-001"
pub fn without_descendants(langid: LanguageIdentifier) -> Self
/// @en-US
pub fn single(langid: LanguageIdentifier) -> Self
/// The locale family containing all locales, equivalent to `::with_descendants("und")`.
pub fn all() -> Self
}
impl DatagenDriver {
pub fn with_locales[_and_fallback](locales: impl FromIterator<Item = LocaleFamily>, options: FallbackOptions) -> Self;
pub fn with_locales_no_fallback(locales: impl FromIterator<Item = LanguageIdentifier>, options: NoFallbackOptions) -> Self;
} LGTM: @sffc @robertbastian |
@robertbastian Currently, passing A few options:
I lean toward 2 because I think |
Option 2 sounds good. We already have |
The new API is more verbose. Old: DatagenDriver::new()
.with_keys([HelloWorldV1Marker::KEY])
.with_locales(SELECTED_LOCALES)
.with_fallback_mode(FallbackMode::Hybrid) New: DatagenDriver::new()
.with_keys([HelloWorldV1Marker::KEY])
.with_locales_and_fallback(
SELECTED_LOCALES
.into_iter()
.map(LocaleFamily::with_descendants),
{
let mut options = FallbackOptions::default();
options.deduplication_strategy = Some(DeduplicationStrategy::NoDeduplication);
options
},
) Should we instead flatten this to DatagenDriver::new()
.with_keys([HelloWorldV1Marker::KEY])
.with_locales(SELECTED_LOCALES) // implies with_descendants
.with_fallback()
.with_deduplication(DeduplicationStrategy::NoDeduplication) Full API impl DatagenDriver {
/// Appends these locales to the selected locale list. If fallback is enabled, implies with_descendants
pub fn with_locales(self, impl IntoIterator<LangaugeIdentifier>) { ... }
/// Appends these locales to the selected locale list.
pub fn with_locale_families(self, impl IntoIterator<LocaleFamily>) { ... }
/// Either without_fallback or with_fallback must be called.
/// If without_fallback is used, an error will occur if any of these functions were also used:
/// - with_locale_families
/// - with_runtime_fallback_location
/// - with_deduplication
pub fn without_fallback(self) { ... }
/// Either without_fallback or with_fallback must be called.
pub fn with_fallback(self) { ... }
/// Only valid in combination with with_fallback
/// Default dependent on the exporter
pub fn with_runtime_fallback_location(self, RuntimeFallbackLocation) { ... }
/// Only valid in combination with with_fallback
/// Default dependent on the runtime fallback location
pub fn with_deduplication(self, DeduplicationStrategy) { ... }
} |
It depends what you're doing. If we have the langid -> locale family conversion implementations, and you use default options, then the invocation will be DatagenDriver::new()
.with_keys([HelloWorldV1Marker::KEY])
.with_locales_and_fallback(SELECTED_LOCALES, Default::default()) which is actually more concise than currently. The whole idea of this design was to model the relationship of all the different flags better, which flattening does not. |
How about impl DatagenDriver {
pub fn with_langids_and_fallback(self,
impl IntoIterator<LangaugeIdentifier>, options: FallbackOptions) { ... }
pub fn with_locale_families_and_fallback(self,
impl IntoIterator<LocaleFamily>, options: FallbackOptions) { ... }
pub fn with_langids_no_fallback(self,
impl IntoIterator<LangaugeIdentifier>, options: NoFallbackOptions) { ... }
} |
Can you clarify how you envision accomplishing this? Without overloading, I can't define a function that takes "either a LanguageIdentifier iterator or a LocaleFamily iterator". |
On LocaleFamily: currently we have
I suggested
I might be okay modeling this as
and just saying that you should always use base languages with this function. |
I thought we had this before, but maybe it's not possible with the current design. I don't think conciseness is the ultimate goal. Having many simple methods makes the API way more complicated, because it cannot be followed in a decision-tree fashion. With the two methods Regarding |
More testing to make sure #4629 doesn't break people
@robertbastian I want to propose the following narrow changes to the API in order to improve brevity... these would be impacting 2.0, not 1.5:
|
Not opposed in general, I'd like to make this decision once we have a final API. I don't want too many methods on the driver, and these don't add any functionality. |
This issue is mostly done in #4710. Known follow-ups in the 1.5 timeframe:
All of these PRs can be standalone and should await @robertbastian's approval. |
Part of #4629 I rewrote `select_locales_for_key` and I think it is a lot cleaner and comprehensible now. --------- Co-authored-by: Robert Bastian <[email protected]>
As discussed in #4606 (comment).
If this is done before 1.5, we can remove
pub fn with_base_language_handling
added in #4606.CC @robertbastian
The text was updated successfully, but these errors were encountered: