Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add a separate feature for each holiday's lower/upper windows #179

Merged
merged 2 commits into from
Nov 21, 2024

Conversation

sd2k
Copy link
Collaborator

@sd2k sd2k commented Nov 21, 2024

Reading through the original Prophet Python code, it looks like we're
supposed to add a single feature for each value in the window of each
holiday, rather than a separate feature every time the holiday occurs.
This commit does so by creating a map of features and updating the
relevant column in the map while iterating over the holidays
occurrences.

Summary by CodeRabbit

  • New Features

    • Enhanced holiday feature handling for improved organization and clarity in data processing.
    • Introduced a new structure for managing holiday feature vectors using a HashMap.
  • Bug Fixes

    • Maintained existing error handling for missing regressors and seasonality conditions.

Copy link
Contributor

coderabbitai bot commented Nov 21, 2024

Walkthrough

The changes in this pull request primarily focus on the prep.rs file, enhancing the make_holiday_features function's handling of holiday features. A new structure using a HashMap is introduced to manage holiday feature vectors, improving the organization and clarity of the feature generation process. The FeatureName enum is updated to implement the Hash trait, facilitating better usability in hash-based collections. Overall, these modifications aim to streamline the management of holiday features while maintaining existing functionality.

Changes

File Path Change Summary
crates/augurs-prophet/src/prophet/prep.rs Enhanced make_holiday_features with a HashMap for holiday feature vectors; updated FeatureName enum to include Hash trait.

Possibly related PRs

Poem

🐰 In the garden where features bloom,
HashMaps dance, dispelling gloom.
Holidays now in order stay,
With names and vectors on display.
A hop, a skip, in code we cheer,
For clearer paths, the goal is near! 🌼

Warning

Rate limit exceeded

@sd2k has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 43 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between f8aa7bb and 7b6281c.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

cloudflare-workers-and-pages bot commented Nov 21, 2024

Deploying augurs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7b6281c
Status: ✅  Deploy successful!
Preview URL: https://b9ac006b.augurs.pages.dev
Branch Preview URL: https://holiday-reuse-features.augurs.pages.dev

View logs

Reading through the original Prophet Python code, it looks like we're
supposed to add a single feature for each value in the window of each
holiday, rather than a separate feature every time the holiday occurs.
This commit does so by creating a map of features and updating the
relevant column in the map while iterating over the holidays
occurrences.
@sd2k sd2k force-pushed the holiday-reuse-features branch from 301388c to f8aa7bb Compare November 21, 2024 17:10
@sd2k sd2k changed the title holiday reuse features fix: add a separate feature for each holiday's lower/upper windows Nov 21, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)
crates/augurs-prophet/src/features.rs (1)

61-61: Consider adding similar date examples in upper_window documentation

For consistency, consider adding specific date examples in the upper_window documentation similar to those added for lower_window.

     /// Set the upper window for the holiday.
     ///
     /// The upper window is the number of days after the holiday
     /// that it is observed. For example, if the holiday is on
-    /// 2023-01-01 and the upper window is 1, then the holiday will
-    /// _also_ be observed on 2023-01-02.
+    /// 2023-01-01 and the upper window is 1, then the holiday will
+    /// _also_ be observed on 2023-01-02.
crates/augurs-prophet/src/prophet/prep.rs (1)

719-731: Consider simplifying the loop when adding holiday features and prior scales.

The current loop iterates over this_holiday_feature_names to push features and prior scales individually. If the order of features and scales is guaranteed, consider pushing them directly from this_holiday_features to reduce potential mismatches.

Apply this diff to refactor the loop:

 for col_name in this_holiday_feature_names {
     features.push(
         col_name.clone(),
         this_holiday_features.remove(&col_name).unwrap(),
     );
     prior_scales.push(
         holiday
             .prior_scale
             .unwrap_or(self.opts.holidays_prior_scale),
     );
 }
+// Alternatively, iterate directly over this_holiday_features
+for (col_name, col_values) in this_holiday_features.drain() {
+    features.push(col_name.clone(), col_values);
+    prior_scales.push(
+        holiday
+            .prior_scale
+            .unwrap_or(self.opts.holidays_prior_scale),
+    );
+}

This change simplifies the code and reduces reliance on the separate this_holiday_feature_names vector.

🛑 Comments failed to post (3)
js/augurs-prophet-js/src/lib.rs (1)

1236-1236: 💡 Codebase verification

Type change requires attention in conversion code

The verification revealed that while the type has been changed to u32 in the struct definitions, there's a conversion to i32 in the implementation that needs attention:

  • In crates/augurs-prophet/src/prophet/prep.rs, the window values are being converted from u32 to i32:
Box::new(x.iter().copied().map(|x| x as i32)) as Box<dyn Iterator<Item = i32>>

This conversion could potentially lose the benefits of using u32 and should be reviewed to determine if the internal logic actually requires signed integers.

🔗 Analysis chain

Type change improves safety and correctness.

Changing from Option<Vec<i32>> to Option<Vec<u32>> for both window fields is a good improvement because:

  1. Window sizes are naturally non-negative values
  2. It prevents potential bugs from negative window sizes
  3. It better represents the domain concept of future/past days

Let's verify that all related code has been updated to handle the new type:

Also applies to: 1245-1245

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining i32 references related to window fields
# and verify that all conversions handle the new u32 type properly

# Search for any remaining i32 references related to windows
rg -i "window.*i32|i32.*window"

# Search for potential type conversion code that might need updates
ast-grep --pattern 'as i32'

Length of output: 63


Script:

#!/bin/bash
# Let's try a more comprehensive search approach

# Search for window-related code to verify type usage
rg -A 5 -B 5 "lower_window|upper_window"

# Search for any i32 usage in the same file
rg "i32" "js/augurs-prophet-js/src/lib.rs"

# Find all files that might reference these window fields
rg -l "window"

Length of output: 12202

crates/augurs-prophet/src/prophet/prep.rs (2)

675-677: ⚠️ Potential issue

Potential integer overflow when casting u32 to i32.

Casting u32 values to i32 can result in integer overflow if the u32 values exceed i32::MAX. This may lead to unexpected behavior or runtime panics when processing large window values. Consider using i64 to safely accommodate the full range of u32 values.

Apply this diff to address the issue:

 let lower = holiday
     .lower_window
     .as_ref()
     .map(|x| {
-        Box::new(x.iter().copied().map(|x| x as i32)) as Box<dyn Iterator<Item = i32>>
+        Box::new(x.iter().copied().map(|x| x as i64)) as Box<dyn Iterator<Item = i64>>
     })
     .unwrap_or_else(|| Box::new(std::iter::repeat(0)));

 let upper = holiday
     .upper_window
     .as_ref()
     .map(|x| {
-        Box::new(x.iter().copied().map(|x| x as i32)) as Box<dyn Iterator<Item = i32>>
+        Box::new(x.iter().copied().map(|x| x as i64)) as Box<dyn Iterator<Item = i64>>
     })
     .unwrap_or_else(|| Box::new(std::iter::repeat(0)));

Additionally, update the types in related variables and calculations to use i64 instead of i32 to ensure consistency and prevent overflow.

Also applies to: 683-684


693-695: ⚠️ Potential issue

Integer overflow risk when negating lower window values.

Negating an i32 value derived from a u32 can cause overflow if lower exceeds i32::MAX. Since lower originates from a u32 and may have large values, negating it may result in an i32 overflow. To prevent this, consider using i64 for offset calculations.

Apply this diff to address the issue:

- for offset in -lower..=upper {
+ for offset in -lower..=upper {
+     let lower = lower as i64;
+     let upper = upper as i64;
+     for offset in -lower..=upper {
          let offset_seconds = offset as i64 * ONE_DAY_IN_SECONDS as i64;
          let occurrence = dt_date + offset_seconds;
          let col_name = FeatureName::Holiday {
              name: name.clone(),
              _offset: offset as i32,
          };

Ensure that all related variables and computations use i64 to handle larger integer values safely.

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
crates/augurs-prophet/src/prophet/prep.rs (2)

Line range hint 124-150: Consider using u32 for holiday offset as per PR objectives.

The _offset field in the Holiday variant is still using i32, but according to the PR objectives, we should be using u32 for the lower and upper windows.

Consider updating the type:

 Holiday {
     name: String,
-    _offset: i32,
+    _offset: u32,
 },

Line range hint 663-731: LGTM! Consider a small optimization for feature allocation.

The new implementation effectively manages holiday features using a HashMap while preserving order. The separation of features for each holiday's lower and upper windows is well implemented.

A minor optimization could be to pre-allocate the HashMap with the expected capacity based on the window sizes.

Consider pre-allocating the HashMap:

-            let mut this_holiday_features: HashMap<FeatureName, Vec<f64>> = HashMap::new();
+            let capacity = (lower.size_hint().0 + upper.size_hint().0 + 1) as usize;
+            let mut this_holiday_features: HashMap<FeatureName, Vec<f64>> = HashMap::with_capacity(capacity);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 301388c and f8aa7bb.

📒 Files selected for processing (1)
  • crates/augurs-prophet/src/prophet/prep.rs (4 hunks)
🔇 Additional comments (1)
crates/augurs-prophet/src/prophet/prep.rs (1)

Line range hint 663-731: Verify the impact of type changes and test coverage.

The changes to use a HashMap for holiday features and the proposed change to use u32 for offsets might affect other parts of the codebase.

Let's verify the impact:

✅ Verification successful

Based on the gathered information from all verification attempts, I can now provide the final response:

The type changes for holiday offsets appear to be localized and safe.

The verification shows that:

  • The _offset field is only used within the FeatureName::Holiday variant
  • All usages of holiday offsets are contained within the same file (prep.rs)
  • The changes to use HashMap for holiday features are internal to the implementation
  • No test cases directly assert holiday offset values
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for other usages of holiday offsets and related tests

# Check for other occurrences of holiday offsets
echo "Checking for holiday offset usages:"
rg -A 2 "_offset.*i32" 

# Check for related test files
echo "Checking for related tests:"
fd -e rs "test" -x rg -l "Holiday.*offset"

Length of output: 379


Script:

#!/bin/bash
# Let's check for holiday-related tests and feature name usages with a broader search

# Check for Holiday enum variant and its usages
echo "Checking Holiday enum variant and usages:"
rg -A 5 "Holiday\s*{" 

# Check for tests containing 'holiday' keyword
echo "Checking holiday-related tests:"
rg -l "holiday.*test|test.*holiday" 

# Check for any holiday-related feature name handling
echo "Checking holiday feature name handling:"
rg -A 3 "FeatureName::Holiday"

Length of output: 1760


Script:

#!/bin/bash
# Let's check the Holiday enum definition and test cases with fixed regex

# Check Holiday enum definition
echo "Checking Holiday enum definition:"
rg -A 2 "enum FeatureName"

# Check holiday-related test cases in prophet.rs
echo "Checking holiday test cases:"
rg -A 10 "holiday.*#\[test\]|#\[test\].*holiday" crates/augurs-prophet/src/prophet.rs

# Check for any holiday struct or type definitions
echo "Checking holiday struct definitions:"
ast-grep --pattern 'struct Holiday {
  $$$
}'

Length of output: 596

@sd2k sd2k enabled auto-merge (squash) November 21, 2024 17:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
crates/augurs-prophet/src/prophet/prep.rs (1)

Line range hint 663-723: Excellent refactor of holiday feature handling.

The changes effectively implement separate features for each holiday window value:

  • Uses a HashMap for efficient feature management
  • Properly handles holiday windows through offset calculations
  • Follows idiomatic Rust patterns for memory management
  • Maintains good performance characteristics

The implementation aligns well with the original Prophet Python code's functionality regarding holiday features.

Consider adding unit tests specifically for edge cases in holiday window handling to ensure continued reliability of this critical feature.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f8aa7bb and 7b6281c.

📒 Files selected for processing (1)
  • crates/augurs-prophet/src/prophet/prep.rs (4 hunks)
🔇 Additional comments (1)
crates/augurs-prophet/src/prophet/prep.rs (1)

Line range hint 124-146: LGTM: Well-structured enum with appropriate trait implementations.

The FeatureName enum is well-designed with clear documentation for each variant. The addition of the Hash trait is appropriate for its use as a key in hash-based collections.

@sd2k sd2k merged commit 856be42 into main Nov 21, 2024
24 checks passed
@sd2k sd2k deleted the holiday-reuse-features branch November 21, 2024 17:29
@sd2k sd2k mentioned this pull request Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant