-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement without_descendants and without_ancestors in datagen #4846
Conversation
@@ -23,7 +23,7 @@ | |||
//! DatagenDriver::new() | |||
//! .with_keys([icu_provider::hello_world::HelloWorldV1Marker::KEY]) | |||
//! .with_all_locales() | |||
//! .export(&DatagenProvider::new_latest_tested(), exporter) | |||
//! .export(&icu_provider::hello_world::HelloWorldProvider, exporter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change doesn't impact behavior; it just makes it so that this crate doesn't need to depend on the provider
feature of icu_datagen
static VALUES: [&<icu::collator::provider::CollationMetadataV1Marker as icu_provider::DataMarker>::Yokeable; 87usize] = [&AF, &AM, &AR, &AR, &AR, &AR, &AM, &AR, &AF, &AR, &AF, &AM, &AF, &AF, &DA, &AF, &AF, &AM, &AF, &AF, &AF, &AF, &AR, &AF, &AF, &AF, &AF, &FR_CA, &AF, &AF, &AR, &AF, &AM, &AR, &AR, &AF, &AF, &AR, &AF, &AF, &AR, &AM, &AR, &AR, &AR, &AR, &AR, &AR, &AR, &AM, <, &AF, &AR, &AR, &AM, &AR, &DA, &AR, &AM, &AF, &AR, &AR, &AF, &AR, &AF, &AM, &AR, &AF, &AF, &AF, &AR, &AF, &AR, &AR, &TH, &AF, &AF, &AF, &AR, &AR, &UND, &AR, &AF, &VI, &AF, &AF, &AR]; | ||
static KEYS: [&str; 87usize] = ["af", "am", "ar", "as", "az", "be", "bg", "bn", "br", "bs", "ceb", "chr", "cs", "cy", "da", "de-u-co-phonebk", "dsb", "el", "en-US-posix", "eo", "es", "et", "fa", "ff-Adlm", "fi", "fil", "fo", "fr-CA", "fy", "gl", "gu", "ha", "he", "hi", "hr", "hsb", "hu", "hy", "ig", "is", "ja", "ka", "kk", "km", "kn", "ko", "kok", "ku", "ky", "lo", "lt", "lv", "mk", "ml", "mn", "mr", "mt", "my", "ne", "no", "or", "pa", "pl", "ps", "ro", "ru", "si", "sk", "sl", "sq", "sr", "sv", "ta", "te", "th", "tk", "to", "tr", "ug", "uk", "und", "ur", "uz", "vi", "wo", "yo", "zh"]; | ||
static VALUES: [&<icu::collator::provider::CollationMetadataV1Marker as icu_provider::DataMarker>::Yokeable; 88usize] = [&AF, &AM, &AR, &AR, &AR, &AR, &AM, &AR, &AF, &AR, &AF, &AM, &AF, &AF, &DA, &AF, &AF, &AF, &AM, &AF, &AF, &AF, &AF, &AR, &AF, &AF, &AF, &AF, &FR_CA, &AF, &AF, &AR, &AF, &AM, &AR, &AR, &AF, &AF, &AR, &AF, &AF, &AR, &AM, &AR, &AR, &AR, &AR, &AR, &AR, &AR, &AM, <, &AF, &AR, &AR, &AM, &AR, &DA, &AR, &AM, &AF, &AR, &AR, &AF, &AR, &AF, &AM, &AR, &AF, &AF, &AF, &AR, &AF, &AR, &AR, &TH, &AF, &AF, &AF, &AR, &AR, &UND, &AR, &AF, &VI, &AF, &AF, &AR]; | ||
static KEYS: [&str; 88usize] = ["af", "am", "ar", "as", "az", "be", "bg", "bn", "br", "bs", "ceb", "chr", "cs", "cy", "da", "de-AT-u-co-phonebk", "de-u-co-phonebk", "dsb", "el", "en-US-posix", "eo", "es", "et", "fa", "ff-Adlm", "fi", "fil", "fo", "fr-CA", "fy", "gl", "gu", "ha", "he", "hi", "hr", "hsb", "hu", "hy", "ig", "is", "ja", "ka", "kk", "km", "kn", "ko", "kok", "ku", "ky", "lo", "lt", "lv", "mk", "ml", "mn", "mr", "mt", "my", "ne", "no", "or", "pa", "pl", "ps", "ro", "ru", "si", "sk", "sl", "sq", "sr", "sv", "ta", "te", "th", "tk", "to", "tr", "ug", "uk", "und", "ur", "uz", "vi", "wo", "yo", "zh"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only behavioral change from this PR. de-AT-u-co-phonebk
becomes a selected locale because we pull extensions up into all supported locales, not just explicit locales.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does reveal an interesting attribute of the deduplicator behavior: de-AT-u-co-phonebk
is retained because it inherits from de-AT
, which has different data, before it inherits from de-u-co-phonebk
. However, de-AT
is deduplicated because it inherits from de
. This means that we could deduplicate de-AT-u-co-phonebk
in a second pass. 🤔
provider/datagen/src/driver.rs
Outdated
|
||
// Fill in missing extensions and aux keys from parent locales, | ||
// and calculate which langids are ancestors and descendants. | ||
for current_langid in locales_map.keys().cloned().collect::<Vec<_>>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of cloning going on here because otherwise I would need multiple mutable borrows of the HashMap at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be clearer to build a new one instead of mutating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Algorithm looks good, just would like a bit less mutation going on.
provider/datagen/src/driver.rs
Outdated
Self { | ||
langid: Some(langid), | ||
include_ancestors: true, | ||
include_descendants: false, | ||
} | ||
} | ||
|
||
/// The family containing all descendants of the selected locale. | ||
/// | ||
/// This family is primarily useful if the root locale is not desired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say it's the primary use case, your example is another (more general) one.
provider/datagen/src/driver.rs
Outdated
is_selected: bool, | ||
data_locales: HashSet<DataLocale>, | ||
} | ||
let mut locales_map: HashMap<LanguageIdentifier, LocalesMapValue> = Default::default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: writing this as
let mut locales_map: HashMap<LanguageIdentifier, LocalesMapValue> = Default::default(); | |
let mut locales_map = HashMap::<LanguageIdentifier, LocalesMapValue>::default(); |
makes the code more maintainable, as that expression is context-independent.
provider/datagen/src/driver.rs
Outdated
} | ||
if explicit.contains(&locale) { | ||
return true; | ||
for family in families { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put this all in an else
, as it doesn't happen when families is empty
provider/datagen/src/driver.rs
Outdated
if families.is_empty() { | ||
// If no locales are selected but fallback is enabled, select the root locale | ||
let value = locales_map.entry(LanguageIdentifier::UND).or_default(); | ||
value.is_selected = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not big fan of keeping state in the locales_map
, would prefer building a new collection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but the code was already mutating the map in locales_map.retain
; I'm just mutating it a bit more. I don't really see the point in re-building the structure. When there aren't aux keys or extensions at play, expensive mutations are not necessary, so we'd basically just end up cloning the map a few times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this code I would rather optimize for readability than for performance. Even with the mutation you're doing a lot of cloning. It's unclear to me as a reader how all the mutations interact, intermediate immutable steps would help here.
provider/datagen/src/driver.rs
Outdated
|
||
// Fill in missing extensions and aux keys from parent locales, | ||
// and calculate which langids are ancestors and descendants. | ||
for current_langid in locales_map.keys().cloned().collect::<Vec<_>>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be clearer to build a new one instead of mutating
provider/datagen/src/driver.rs
Outdated
.map(|family| family.include_ancestors) | ||
.unwrap_or(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is really worth, performance wise, to have family be an optional borrow, instead of just creating the families for all LanguageIdentifiers? This false
is the same as single
's, but here it's just a false
literal.
Same below for include_descendants
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The presence of the family is how I indicate that the langid was explicit. If it is None
, the langid is supported but not explicit.
I did a bunch more refactoring to reduce the amount of mutation. I agree it makes it easier to follow. Hope you like it :) |
Co-authored-by: Robert Bastian <[email protected]>
LocalesWithOrWithoutFallback::WithoutFallback { langids: locales } => { | ||
let mut sorted_locales = locales | ||
LocalesWithOrWithoutFallback::WithoutFallback { langids } => { | ||
let mut sorted_locales = langids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorted_langids
?
Part of #4629
I rewrote
select_locales_for_key
and I think it is a lot cleaner and comprehensible now.