-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: support holidays on non-UTC time zones #176
Conversation
WalkthroughThe pull request introduces significant modifications to the Changes
Possibly related PRs
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
4d84231
to
fbd0d11
Compare
Deploying augurs with Cloudflare Pages
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (6)
crates/augurs-prophet/src/features.rs (1)
95-101
: Consider enhancing floor_day documentationWhile the implementation is correct, the documentation could be more detailed to explain:
- How the remainder calculation works with positive/negative offsets
- Why we subtract the remainder to floor the timestamp
Consider adding this documentation:
/// Return the Unix timestamp of the given date, rounded down to the nearest day, /// adjusted by the holiday's UTC offset. +/// +/// # Implementation Details +/// 1. Adds UTC offset to align with local timezone +/// 2. Calculates remainder of seconds since midnight in local time +/// 3. Subtracts remainder to get the start of the day +/// +/// # Examples +/// ``` +/// # use augurs_prophet::features::Holiday; +/// let holiday = Holiday::new(vec![]) +/// .with_utc_offset(3600); // UTC+1 +/// assert_eq!(holiday.floor_day(1732189701), 1732147200); +/// ```crates/augurs-prophet/src/prophet/prep.rs (5)
19-19
: Use consistent naming for time constantsThe new constant
ONE_DAY_IN_SECONDS_INT
introduces inconsistency in naming conventions. To enhance code readability, consider unifying the naming of time constants. For example, you could define all time constants with the same suffix or indicate their type in the name consistently.Apply this diff:
-pub(crate) const ONE_DAY_IN_SECONDS_INT: i64 = 24 * 60 * 60; +pub(crate) const ONE_DAY_IN_SECONDS: i64 = 24 * 60 * 60;And update usages accordingly.
694-694
: Utilize the integer constant to avoid unnecessary castingIn the calculation of
offset_seconds
, you are castingONE_DAY_IN_SECONDS
fromf64
toi64
. Since you have definedONE_DAY_IN_SECONDS_INT
as ani64
, use it directly to prevent unnecessary casting and potential precision errors.Apply this diff:
-let offset_seconds = offset as i64 * ONE_DAY_IN_SECONDS as i64; +let offset_seconds = offset as i64 * ONE_DAY_IN_SECONDS_INT;
675-677
: Simplify iterator creation for lower and upper windowsThe current implementation for converting
Option<Vec<u32>>
toIterator<Item = i32>
is unnecessarily complex. You can simplify it by avoiding theBox<dyn Iterator>
and directly usingmap_or_else
.Apply this diff:
-let lower = holiday - .lower_window - .as_ref() - .map(|x| { - Box::new(x.iter().copied().map(|x| x as i32)) as Box<dyn Iterator<Item = i32>> - }) - .unwrap_or_else(|| Box::new(std::iter::repeat(0))); +let lower = holiday.lower_window.as_ref().map_or_else( + || std::iter::repeat(0), + |x| x.iter().copied().map(|x| x as i32), +);Repeat similarly for
upper
.Also applies to: 682-684
710-710
: Optimize the search for matching timestampsUsing
positions
in combination withds.iter()
may not be efficient for large datasets. Consider creating aHashSet
of precomputed floored timestamps to improve performance when checking for matching dates.Example modification:
let floored_ds: HashMap<TimestampSeconds, Vec<usize>> = ds .iter() .enumerate() .map(|(idx, &x)| (holiday.floor_day(x), idx)) .into_group_map(); for (&occurrence, indices) in floored_ds.iter().filter(|(&date, _)| date == occurrence) { for &idx in indices { col[idx] = 1.0; } }
1233-1234
: Update deprecatedchrono
methods to the newer versionsThe
with_ymd_and_hms
method is deprecated in thechrono
crate. Replace it with the recommendedwith_ymd_opt
andand_hms_opt
methods for better compatibility with future versions.Apply this diff:
-let start = Utc.with_ymd_and_hms(2024, 1, 1, 0, 0, 0).unwrap(); +let start = Utc.with_ymd_opt(2024, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap();Repeat similarly for other occurrences.
Also applies to: 1266-1267, 1284-1285, 1288-1289, 1292-1293
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
crates/augurs-prophet/src/features.rs
(6 hunks)crates/augurs-prophet/src/prophet/prep.rs
(5 hunks)js/augurs-prophet-js/src/lib.rs
(3 hunks)
🔇 Additional comments (11)
crates/augurs-prophet/src/features.rs (6)
4-6
: LGTM: Import changes are appropriate
The addition of ONE_DAY_IN_SECONDS_INT
from the prep module is correctly imported to support the new timezone-aware day calculations.
35-36
: Implementation changes are well-structured
The changes maintain backward compatibility with a default UTC offset of 0, and the window methods are correctly updated to use u32
while preserving robust error handling.
Also applies to: 46-46, 65-65
84-93
: LGTM: UTC offset setter is well-implemented
The with_utc_offset
method follows the builder pattern consistently and is well-documented.
259-285
: Test coverage is comprehensive
The tests effectively cover:
- No UTC offset (default behavior)
- Positive UTC offset (ahead of UTC)
- Negative UTC offset (behind UTC)
Each case tests both exact midnight timestamps and non-midnight timestamps.
95-101
: Verify alignment with Python Prophet's timezone handling
The timezone handling implementation should be verified against Python Prophet's approach to ensure consistent behavior.
✅ Verification successful
The previous search revealed some test cases and implementation in Rust, but let's gather more context about the timezone handling implementation and tests to ensure proper verification.
Timezone handling implementation is correctly aligned with requirements
The implementation is well-tested and properly handles timezone offsets:
- Correctly implements floor_day with UTC offset adjustments
- Includes comprehensive test cases covering:
- No offset scenarios (floor_day_no_offset)
- Positive UTC offsets (floor_day_positive_offset)
- Negative UTC offsets (floor_day_negative_offset)
- The implementation matches the expected behavior where timestamps are adjusted by the UTC offset before flooring to the nearest day
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for timezone/UTC offset handling in Python Prophet
rg -p "utc.*offset|timezone|floor.*day"
# Search for related test cases
rg -p "test.*timezone|test.*utc"
Length of output: 1817
Script:
#!/bin/bash
# Let's check the Python Prophet repository for timezone handling implementation
# First, let's look for the test file content
ast-grep --pattern 'fn holiday_floor_day_$_($$$) {
$$$
}'
# Also search for any documentation or comments about timezone handling
rg -p "timezone|utc.*offset" -A 3 -B 3
Length of output: 3628
22-25
: Type changes and new field look good
The changes align well with the PR objectives:
- Changing window types from
i32
tou32
correctly enforces non-negative windows - Adding
utc_offset
enables timezone support
Let's verify there are no remaining uses of i32
for holiday windows:
✅ Verification successful
Type changes for holiday windows are correctly implemented
The verification confirms that:
- The Holiday struct and its implementations consistently use
u32
for both lower and upper windows - No remaining
i32
usage exists for holiday windows in the codebase - The implementation properly validates window lengths against holiday dates
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining i32 usage with holiday windows
rg -p "i32.*window"
Length of output: 21
Script:
#!/bin/bash
# Let's do a more thorough search for any remaining i32 vectors in the codebase
# First, search for i32 with Vec
rg "Vec<i32>"
# Also search for window-related type definitions
rg -p "type.*[Ww]indow"
# And check for any remaining i32 usage in holiday-related files
fd -e rs holiday --exec rg -p "i32" {}
Length of output: 1205
Script:
#!/bin/bash
# Let's specifically check the holiday-related code for any window type definitions or usages
ast-grep --pattern 'struct Holiday {
$$$
}'
# Also check for any window-related method implementations for Holiday
ast-grep --pattern 'impl Holiday {
$$$
}'
# And check the tests for holiday windows usage
fd test -e rs --exec rg -p "window.*i32|i32.*window" {}
Length of output: 5865
js/augurs-prophet-js/src/lib.rs (4)
1233-1236
: LGTM! Clear and helpful documentation.
The documentation effectively explains the lower_window
field with a practical example that illustrates how the window affects holiday observation.
1245-1245
: LGTM! Appropriate type change for upper_window.
The change from i32
to u32
clarifies that window values should never be negative, improving type safety.
1250-1257
: LGTM! Well-documented UTC offset field.
The new utc_offset_seconds
field:
- Is properly documented with clear explanation of its purpose
- Uses appropriate types (
Option<TimestampSeconds>
) - Aligns with the PR objective of supporting holidays in different time zones
1274-1276
: LGTM! Clean implementation of UTC offset handling.
The implementation:
- Properly handles the optional UTC offset
- Maintains consistency with the builder pattern used for other fields
- Correctly integrates with the existing conversion logic
crates/augurs-prophet/src/prophet/prep.rs (1)
701-706
: Ensure thread safety when accessing shared resources
The code adds entries to this_holiday_features
while iterating, which could lead to concurrency issues if accessed from multiple threads. If this code is intended to be thread-safe, consider using synchronization mechanisms like Mutex
or refactoring to avoid shared mutable state.
If thread-safety is not a concern, you can disregard this comment.
Prior to this commit, when determining whether a given holiday's features should be 0 or 1 for a given timestamp, we checked whether each _day_ included in the holiday's lower-upper windows included the timestamp, and set the value to 1 if so. However, when rounding the holiday's timestamps down to 'day' we assumed that the holiday started and ended at midnight UTC, which won't be the case for certain holidays (i.e. anything outside of UTC). This commit does three things: 1. adds the 'Holiday::with_utc_offset()' method which allows a holiday to use non-UTC-aligned days when its timestamps are being floored 2. rather than adding a separate feature for each time a holiday's lower/upper window are found to contain a timestamp, reuse the same feature for each offset, which is what the Python Prophet implementation does. Really this part should be moved to a separate bugfix PR... 3. switches the lower and upper windows to be u32 instead of i32, to reflect the fact that they should never really be negative (it is quite confusing that the original Prophet expects lower windows to always be negative and upper windows always positive).
fbd0d11
to
1baf0d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
crates/augurs-prophet/src/features.rs (3)
95-101
: Consider enhancing documentation for floor_dayWhile the implementation is correct, the documentation could be more detailed to explain:
- Why we add the offset before the modulo operation
- How the remainder calculation works
- Example calculations for clarity
Consider expanding the documentation:
/// Return the Unix timestamp of the given date, rounded down to the nearest day, /// adjusted by the holiday's UTC offset. +/// +/// # Implementation Details +/// 1. Adds UTC offset to align with local timezone +/// 2. Uses modulo to find partial day seconds +/// 3. Subtracts remainder to get day boundary +/// +/// # Example +/// ``` +/// let holiday = Holiday::new(vec![]).with_utc_offset(3600); // UTC+1 +/// let ts = 1732189701; // 2024-11-21 11:15:01 UTC +/// assert_eq!(holiday.floor_day(ts), 1732132800); // 2024-11-21 00:00:00 UTC+1 +/// ``` pub(crate) fn floor_day(&self, ds: TimestampSeconds) -> TimestampSeconds {
259-285
: Consider adding more edge case testsThe current test coverage is good for basic functionality, but consider adding tests for:
- DST transition edge cases
- Timestamps exactly at day boundaries
- Large UTC offsets (e.g., UTC+14)
- Leap second handling
Example additional test:
#[test] fn holiday_floor_day_edge_cases() { // Test exact day boundary let holiday = Holiday::new(vec![]).with_utc_offset(0); assert_eq!(holiday.floor_day(1732147200), 1732147200); // Already at day boundary // Test maximum valid offset (UTC+14) let max_offset = 14 * 60 * 60; let holiday_max = Holiday::new(vec![]).with_utc_offset(max_offset); assert_eq!(holiday_max.floor_day(1732147200), 1732097600); // Test near day boundary let holiday_near = Holiday::new(vec![]).with_utc_offset(60); // UTC+1min assert_eq!(holiday_near.floor_day(1732147199), 1732146600); // 23:59:59 UTC }
Line range hint
1-285
: Consider adding timezone handling documentationWhile the implementation correctly handles non-UTC timezones, consider adding:
- Module-level documentation explaining the timezone handling strategy
- Examples showing how to define holidays in different timezones
- Migration guide for users updating from previous versions
Example module documentation:
//! Features used by Prophet, such as seasonality, regressors and holidays. //! //! # Timezone Handling //! Holidays are defined using Unix timestamps but can be aligned to specific timezone //! days using the `utc_offset` field. This ensures that holidays start and end at //! midnight in their local timezone rather than UTC. //! //! ## Example //! ```rust //! use augurs_prophet::Holiday; //! //! // Define US New Year's Day (UTC-5) //! let new_years = Holiday::new(vec![1704067200]) // 2025-01-01 00:00:00 UTC //! .with_utc_offset(-5 * 3600); // UTC-5 //! ```crates/augurs-prophet/src/prophet/prep.rs (1)
1222-1373
: LGTM! Comprehensive test coverage for timezone-aware holiday handling.The test thoroughly validates the holiday feature generation with:
- UTC and non-UTC timezone scenarios
- Window offsets in different timezones
- Multiple holiday occurrences
Consider extracting the test data setup into helper functions to improve readability:
- let holidays: HashMap<String, Holiday> = [ - ( - "UTC holiday".to_string(), - Holiday::new(vec![ - Utc.with_ymd_and_hms(2024, 1, 2, 0, 0, 0) - .unwrap() - .timestamp(), - // ... - ]), - ), - // ... - ].into(); + let holidays: HashMap<String, Holiday> = create_test_holidays(non_utc_tz); + + fn create_test_holidays(non_utc_tz: FixedOffset) -> HashMap<String, Holiday> { + [ + ( + "UTC holiday".to_string(), + create_utc_holiday(), + ), + ( + "Non-UTC holiday".to_string(), + create_non_utc_holiday(non_utc_tz), + ), + // ... + ].into() + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
crates/augurs-prophet/src/features.rs
(5 hunks)crates/augurs-prophet/src/prophet/prep.rs
(5 hunks)js/augurs-prophet-js/src/lib.rs
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- js/augurs-prophet-js/src/lib.rs
🔇 Additional comments (6)
crates/augurs-prophet/src/features.rs (3)
4-6
: LGTM: Import changes are appropriate
The addition of ONE_DAY_IN_SECONDS_INT
from prophet::prep
is correctly imported for use in the new time zone handling functionality.
84-93
: LGTM: Well-documented UTC offset setter
The with_utc_offset
method is properly implemented with clear documentation explaining its purpose.
24-25
: Type changes and new field look good
The changes improve type safety:
- Using
u32
for windows correctly enforces non-negative values - The new
utc_offset
field with a default of 0 properly handles UTC time zone
Let's verify no negative window values exist in the codebase:
Also applies to: 35-36
✅ Verification successful
Type changes are safe and correctly implemented
The codebase analysis confirms that:
- All usages of
lower_window
andupper_window
are non-negative, with examples showing values like[0, 0]
,[1, 1]
, and[1; 3]
- The type change from
i32
tou32
is properly enforced through the struct definition and builder methods - Error handling is in place to validate window lengths against holiday dates
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any negative window values in numeric literals or test data
rg -e '-\d+.*(?:lower|upper)_window' -e '(?:lower|upper)_window.*-\d+'
Length of output: 72
Script:
#!/bin/bash
# Let's try a simpler search pattern to find any usage of lower_window or upper_window
rg "(?:lower|upper)_window.*=.*\d+" -U
Length of output: 40
Script:
#!/bin/bash
# Also search for test data initialization patterns
rg "Holiday.*\{.*(?:lower|upper)_window" -U
Length of output: 45
Script:
#!/bin/bash
# Search for any struct initialization or assignments using ast-grep
ast-grep --pattern 'lower_window = $_'
ast-grep --pattern 'upper_window = $_'
Length of output: 81
Script:
#!/bin/bash
# Let's try a more basic search to find any files containing these fields
rg -l "lower_window|upper_window"
Length of output: 184
Script:
#!/bin/bash
# Search for test files that might contain test data
fd "test.*\.rs$" --exec cat {}
Length of output: 26242
Script:
#!/bin/bash
# Let's examine the actual usage of these fields in the files we found
rg -A 5 -B 5 "(?:lower|upper)_window" crates/augurs-prophet/src/features.rs crates/augurs-prophet/src/prophet.rs crates/augurs-prophet/src/prophet/prep.rs js/augurs-prophet-js/src/lib.rs
Length of output: 13157
crates/augurs-prophet/src/prophet/prep.rs (3)
19-19
: LGTM! Appropriate visibility change for the constant.
The change to make ONE_DAY_IN_SECONDS_INT
visible within the crate is well-aligned with its usage in timezone calculations for holiday features.
Line range hint 685-707
: LGTM! Improved timezone handling in holiday feature generation.
The changes correctly handle timezone-aware day boundaries by:
- Using
floor_day
to properly align timestamps with the holiday's timezone - Consistently applying timezone offsets when checking holiday occurrences
- Correctly handling window offsets in the context of different timezones
1096-1104
: LGTM! Well-designed test utility macro.
The concat_all
macro provides a clean and safe way to construct test vectors by concatenating multiple slices, improving test readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
crates/augurs-prophet/src/features.rs (3)
84-97
: Consider adding timezone offset validationWhile the implementation is correct, consider adding validation for the UTC offset to ensure it falls within reasonable bounds (typically -12 to +14 hours).
pub fn with_utc_offset(mut self, utc_offset: TimestampSeconds) -> Self { + // Validate that offset is within reasonable bounds (-12 to +14 hours) + debug_assert!( + utc_offset >= -12 * 3600 && utc_offset <= 14 * 3600, + "UTC offset {} is outside reasonable bounds", + utc_offset + ); self.utc_offset = utc_offset; self }
99-105
: Consider handling timestamp overflowThe current implementation might overflow with extreme timestamp values. Consider adding checks or using a safer arithmetic approach.
pub(crate) fn floor_day(&self, ds: TimestampSeconds) -> TimestampSeconds { + // Check for potential overflow + let adjusted = ds.checked_add(self.utc_offset) + .expect("Timestamp adjustment would overflow"); - let remainder = (ds + self.utc_offset) % ONE_DAY_IN_SECONDS_INT; + let remainder = adjusted % ONE_DAY_IN_SECONDS_INT; ds - remainder }
263-370
: LGTM! Consider additional edge case testsThe test coverage is comprehensive, testing various timezone scenarios. Consider adding tests for:
- Leap second handling
- DST transition points (particularly around spring forward and fall back)
#[test] fn holiday_floor_day_dst_transition() { // Test around DST transition points let offset = FixedOffset::east_opt(60 * 60).unwrap(); // Spring forward (e.g., 2:00 AM becomes 3:00 AM) let spring_forward = offset .with_ymd_and_hms(2024, 3, 10, 2, 30, 0) .unwrap() .timestamp(); // Fall back (e.g., 2:00 AM occurs twice) let fall_back = offset .with_ymd_and_hms(2024, 11, 3, 1, 30, 0) .unwrap() .timestamp(); let holiday = Holiday::new(vec![]) .with_utc_offset(offset.local_minus_utc() as i64); // Both timestamps should floor to their respective midnight assert_eq!( holiday.floor_day(spring_forward), offset .with_ymd_and_hms(2024, 3, 10, 0, 0, 0) .unwrap() .timestamp() ); assert_eq!( holiday.floor_day(fall_back), offset .with_ymd_and_hms(2024, 11, 3, 0, 0, 0) .unwrap() .timestamp() ); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
crates/augurs-prophet/src/features.rs
(5 hunks)
🔇 Additional comments (3)
crates/augurs-prophet/src/features.rs (3)
4-6
: LGTM!
The addition of ONE_DAY_IN_SECONDS_INT
import is appropriate for the new timezone handling functionality.
Line range hint 19-25
: LGTM! Type changes improve type safety
The change from i32
to u32
for window values is a good improvement as these values should never be negative. The addition of utc_offset
field properly supports timezone handling.
Line range hint 31-37
: LGTM! Proper initialization of new field
The utc_offset
is correctly initialized to 0, maintaining backward compatibility with existing code.
Superseded by #181 |
Prior to this commit, when determining whether a given holiday's features should be 0 or 1 for a given timestamp, we checked whether each day included in the holiday's lower-upper windows included the timestamp, and set the value to 1 if so. However, when rounding the holiday's timestamps down to 'day' we assumed that the holiday started and ended at midnight UTC, which won't be the case for certain holidays (i.e. anything outside of UTC).
This commit adds the
Holiday::with_utc_offset()
method which allows a holiday to use non-UTC-aligned days when its timestamps are being floored. It also adds tests for themake_holiday_features
function to ensure the UTC offset is being respected.Summary by CodeRabbit
New Features
Holiday
struct with newutc_offset
andutc_offset_seconds
fields for better timestamp handling.Bug Fixes
Tests