Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prophet algorithm in augurs-prophet crate #118

Merged
merged 68 commits into from
Oct 10, 2024
Merged

Add Prophet algorithm in augurs-prophet crate #118

merged 68 commits into from
Oct 10, 2024

Conversation

sd2k
Copy link
Collaborator

@sd2k sd2k commented Oct 3, 2024

This PR adds an implementation of the Prophet forecasting model. It handles:

  • definitions of types for input and output data
  • transformation of data into something suitable for an optimizer
  • generation of predictions using optimized model parameters obtained from an optimizer
  • calculation of uncertainty intervals by sampling using those parameters

The Optimizer trait (for which there are no implementations, yet) allows the model to work without any a dependency on any specific optimizer. The Python and R libraries have (in the past) offered bindings using Stan libraries and the cmdstan interface; the cmdstan is the only one now supported. A similar interface could be added here with relative ease.

Summary by CodeRabbit

  • New Features

    • Introduced a new bacon.toml configuration file for enhanced task management in Rust projects.
    • Added multiple new dependencies to the project, including augurs-prophet, chrono, num-traits, rand, and pretty_assertions.
    • Implemented the Prophet forecasting model with various configuration options and prediction capabilities.
    • Introduced functionality for loading and processing time series data from CSV files.
    • Added a new section in the README.md detailing the augurs-prophet package and its development requirements.
    • Included Apache and MIT license files to the project.
    • Added new structures for enhanced model configuration, including FeatureMode, Holiday, Regressor, Seasonality, and Standardize.
  • Bug Fixes

    • Updated dependency management for rand to use workspace specifications.
  • Documentation

    • Added a detailed README.md for the augurs-prophet package explaining its functionality and usage.
    • Included documentation for new features and options in the Prophet model.
  • Tests

    • Included unit tests for various functionalities within the Prophet model and its components.
  • Chores

    • Updated .gitignore to exclude .bacon-locations.
    • Added a watch target in the justfile for monitoring project changes.

Copy link
Contributor

coderabbitai bot commented Oct 3, 2024

Caution

Review failed

The pull request is closed.

Walkthrough

The recent changes include updates to the README.md file to introduce the new crate augurs-prophet, which implements the Prophet time series forecasting algorithm. The .gitignore file has been modified to ignore .bacon-locations. The Cargo.toml files have been updated to add and adjust dependencies, including several new dependencies for the augurs-prophet package. Additionally, new configuration and implementation files have been introduced for data handling, error management, and enhancements to the Prophet model, along with utilities for data preprocessing and testing. The justfile has also been updated to include a new watch target.

Changes

File Path Change Summary
.gitignore Added entry for .bacon-locations.
Cargo.toml Added dependencies: augurs-prophet, chrono, num-traits, rand, pretty_assertions.
bacon.toml New configuration file for the bacon tool with job definitions.
crates/augurs-ets/Cargo.toml Changed rand dependency to workspace declaration.
crates/augurs-outlier/Cargo.toml Changed rand dependency to workspace declaration.
crates/augurs-prophet/Cargo.toml New package with dependencies: itertools, num-traits, rand, thiserror, etc.
crates/augurs-prophet/README.md Introduced README detailing the Prophet time series forecasting library.
crates/augurs-prophet/src/data.rs Added TrainingData and PredictionData structs with methods for validation and data manipulation.
crates/augurs-prophet/src/error.rs Added Error enum for handling various error scenarios in the Prophet model.
crates/augurs-prophet/src/features.rs Introduced features related to seasonality, regressors, and holidays.
crates/augurs-prophet/src/lib.rs Established module structure for the augurs-prophet library and re-exported types.
crates/augurs-prophet/src/optimizer.rs Introduced optimization structures and traits for the Prophet model.
crates/augurs-prophet/src/positive_float.rs Implemented PositiveFloat struct for positive-only floating-point numbers.
crates/augurs-prophet/src/prophet.rs Introduced Prophet struct with methods for fitting and predicting time series data.
crates/augurs-prophet/src/prophet/options.rs Defined configuration options for the Prophet model.
crates/augurs-prophet/src/prophet/predict.rs Implemented prediction structures and methods for the Prophet model.
crates/augurs-prophet/src/testdata.rs Added functionality for loading and processing time series data from CSV files.
crates/augurs-prophet/src/util.rs Introduced FloatIterExt trait for extended functionality on floating-point iterators.
crates/augurs-seasons/Cargo.toml Updated num-traits dependency to workspace declaration.
crates/augurs/Cargo.toml Added optional dependency augurs-prophet and feature prophet.
crates/augurs/src/lib.rs Added conditional export for augurs_prophet module under feature flag "prophet".
justfile Introduced new target watch to execute the command bacon.
crates/augurs-prophet/LICENSE-APACHE Added Apache License, Version 2.0.
crates/augurs-prophet/LICENSE-MIT Added MIT License text.

Possibly related PRs

  • chore: bump to v0.3.1 #103: The changes in this PR involve updating the Cargo.toml file to include new dependencies, including augurs-prophet, which is also a focus of the main PR. This indicates a direct relationship as both PRs modify the same dependency management file.

Suggested labels

documentation

Poem

In the garden of code, where changes bloom,
New features sprout, dispelling the gloom.
With Prophet's wisdom, we forecast the day,
As rabbits hop forth, in a joyful ballet.
Dependencies added, and errors refined,
A wondrous new world for all to unwind! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@sd2k sd2k changed the title WIP - add Prophet algorithm in augurs-prophet crate WIP - add Prophet algorithm in augurs-prophet crate Oct 3, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)
crates/augurs-prophet/Cargo.toml (1)

11-21: LGTM: Dependencies are well-chosen and configured.

The dependencies are appropriate for a forecasting library and are correctly configured:

  • Regular dependencies (itertools, num-traits, rand, statrs, thiserror) are suitable for the project's needs.
  • Dev-dependencies (augurs-testing, chrono, pretty_assertions) are well-chosen for testing purposes.
  • Using workspace-level version specifications for all dependencies ensures consistency across the project.

Consider adding a comment explaining the purpose of each dependency, especially for less common ones like statrs or thiserror. This can help other developers understand why each dependency is necessary.

Cargo.toml (2)

38-45: LGTM: Addition of new dependencies for Prophet implementation

The new dependencies (chrono, num-traits, rand, statrs) and the update to itertools are appropriate for implementing the Prophet algorithm and performing time series analysis. The specified versions are recent, which is good for security and feature availability.

Consider using version ranges (e.g., ^0.4.38 instead of 0.4.38) for non-workspace dependencies to allow for compatible updates without changing the Cargo.toml file. This can help keep dependencies up-to-date with bug fixes and minor improvements.


54-54: LGTM: Addition of pretty_assertions for improved test output

The addition of the pretty_assertions crate is a good choice for enhancing the readability of test failure messages. This will improve the developer experience when working with tests.

Consider moving pretty_assertions to a [dev-dependencies] section in the Cargo.toml file. This would more clearly separate it from runtime dependencies and potentially improve build times for release builds.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 1e6b1fb and 90d29cf.

📒 Files selected for processing (4)
  • Cargo.toml (2 hunks)
  • crates/augurs-prophet/Cargo.toml (1 hunks)
  • crates/augurs-prophet/src/lib.rs (1 hunks)
  • crates/augurs-prophet/src/prophet/predict.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/augurs-prophet/src/lib.rs
🧰 Additional context used
🔇 Additional comments (12)
crates/augurs-prophet/Cargo.toml (1)

1-10: LGTM: Package metadata is well-configured.

The package metadata is correctly set up:

  • The package name "augurs-prophet" is appropriate and follows naming conventions.
  • Using workspace-level configurations for common fields (license, authors, documentation, repository, version, edition, and keywords) is a good practice for maintaining consistency across multiple crates in a workspace.
Cargo.toml (1)

34-34: LGTM: Addition of augurs-prophet dependency

The addition of the augurs-prophet dependency is consistent with the PR objective of implementing the Prophet algorithm. The version (0.3.1) matches the workspace package version, and the local path is correctly specified.

crates/augurs-prophet/src/prophet/predict.rs (10)

1-103: Well-structured and documented code

The initial segment of the file, including imports and struct definitions, is well-organized and properly documented. The use of pub(super) for internal structs shows good encapsulation practices.


122-155: LGTM: Robust trend prediction implementation

The predict_trend method effectively handles different growth types (Linear, Logistic, and Flat) with proper error handling. The implementation is well-structured and easy to understand.


157-198: Efficient piecewise linear implementation

The piecewise_linear method is well-implemented using iterators and vector operations, which is efficient and idiomatic Rust. The code is also well-commented, explaining each step of the algorithm.


200-235: Well-implemented piecewise logistic function

The piecewise_logistic method correctly implements the piecewise logistic function for trend prediction. The code is well-structured, efficient, and follows the expected algorithm for logistic growth.


238-240: Concise and correct flat trend implementation

The flat_trend method efficiently implements a flat trend using std::iter::repeat. The implementation is straightforward and correct.


333-361: Efficient feature prediction implementation

The predict_feature method efficiently predicts a single feature using matrix multiplication. The use of izip! for simultaneous iteration over multiple vectors is a good practice. The method correctly handles both additive and multiplicative features.


364-402: Robust uncertainty prediction implementation

The predict_uncertainty method correctly calculates uncertainty intervals for the forecast. It uses appropriate methods for sampling and percentile calculation. The handling of both yhat and trend uncertainty is well-implemented.


466-504: Well-implemented model simulation

The sample_model method correctly simulates observations from the extrapolated model. It appropriately handles additive and multiplicative components separately and adds noise using a Normal distribution, which is standard practice in time series forecasting.


506-577: Robust predictive trend sampling

The sample_predictive_trend method effectively samples new changepoints and deltas to predict the trend. It uses appropriate probability distributions (Poisson and Laplace) and handles edge cases correctly. The implementation is robust and follows best practices for predictive trend sampling in time series forecasting.


582-602: Reliable percentile calculation

The percentile_of_sorted function, adapted from the Rust compiler's test suite, correctly calculates percentiles of sorted samples. It handles all edge cases properly and is a reliable implementation for this statistical operation.

crates/augurs-prophet/src/prophet/predict.rs Outdated Show resolved Hide resolved
@sd2k
Copy link
Collaborator Author

sd2k commented Oct 10, 2024

@coderabbitai resolve

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Outside diff range and nitpick comments (2)
crates/augurs-prophet/src/error.rs (1)

58-60: Consider adding more context to the Scaling variant.

The Scaling variant's error message is quite brief. Consider adding more context or the ability to include a reason for the scaling failure.

You could modify the variant to include a reason:

#[error("Scaling failed: {0}")]
Scaling(String),
crates/augurs-prophet/src/prophet/options.rs (1)

32-82: LGTM: Well-defined enums with appropriate documentation.

The SeasonalityOption, Scaling, and EstimationMode enums are clearly defined and well-documented. The non_exhaustive attribute on EstimationMode is a good choice for future extensibility.

Consider implementing Default for Scaling and EstimationMode for consistency with SeasonalityOption:

impl Default for Scaling {
    fn default() -> Self {
        Self::AbsMax
    }
}

impl Default for EstimationMode {
    fn default() -> Self {
        Self::Mle
    }
}

This aligns with the defaults specified in the ProphetOptions struct and improves consistency across the module.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 90d29cf and 9fd92a9.

📒 Files selected for processing (3)
  • crates/augurs-prophet/src/error.rs (1 hunks)
  • crates/augurs-prophet/src/prophet/options.rs (1 hunks)
  • crates/augurs-prophet/src/prophet/predict.rs (1 hunks)
🧰 Additional context used
🔇 Additional comments (25)
crates/augurs-prophet/src/error.rs (17)

1-6: LGTM: Well-structured error enum declaration.

The Error enum is well-defined with appropriate visibility, derives, and attributes. The use of thiserror and #[non_exhaustive] demonstrates good practices for error handling in a library crate.


7-18: LGTM: Well-structured MismatchedLengths variant.

The MismatchedLengths variant provides comprehensive information about mismatched column lengths, including the lengths and names of both columns involved.


19-21: LGTM: Clear NotEnoughData variant.

The NotEnoughData variant has a concise and clear error message.


22-24: LGTM: Informative OptimizationFailed variant.

The OptimizationFailed variant allows for a custom error message, which is useful for providing specific details about the optimization failure.


25-29: LGTM: Well-documented InvalidIntervalWidth variant.

The InvalidIntervalWidth variant includes a helpful comment explaining the valid range and provides the invalid value in the error message.


33-35: LGTM: Clear TimestampsAreConstant variant.

The TimestampsAreConstant variant provides the constant timestamp value, which is helpful for debugging.


36-38: LGTM: Informative DuplicateSeasonality variant.

The DuplicateSeasonality variant includes the name of the duplicate seasonality, which is useful for identifying the issue.


39-44: LGTM: Well-structured InfiniteValue variant.

The InfiniteValue variant provides the name of the column containing the infinite value, which is helpful for debugging.


45-50: LGTM: Well-structured NaNValue variant.

The NaNValue variant provides the name of the column containing the NaN value, which is helpful for debugging.


51-53: LGTM: Clear MissingRegressor variant.

The MissingRegressor variant includes the name of the missing regressor, which is useful for identifying the issue.


54-57: LGTM: Informative MissingSeasonalityCondition variant.

The MissingSeasonalityCondition variant includes the name of the missing condition, which is helpful for debugging.


61-63: LGTM: Clear MissingCap variant.

The MissingCap variant has a concise and clear error message.


64-66: LGTM: Clear MissingFloor variant.

The MissingFloor variant has a concise and clear error message.


67-69: LGTM: Informative CapNotGreaterThanFloor variant.

The CapNotGreaterThanFloor variant provides a clear explanation of the issue, including the default floor value.


70-73: LGTM: Clear ChangepointsOutOfRange variant.

The ChangepointsOutOfRange variant has a concise and clear error message.


74-78: LGTM: Well-documented ModelNotFit variant.

The ModelNotFit variant includes a helpful comment directing users to the appropriate method for fitting the model.


1-85: Overall, the Error enum is well-designed and comprehensive.

The Error enum provides a robust error handling mechanism for the Prophet forecasting algorithm. It covers a wide range of potential error scenarios with clear and informative error messages. Most variants are well-structured and include helpful associated data where appropriate.

There are a few minor issues to address:

  1. Fix the syntax in the TooManyDataPoints error message.
  2. Consider adding more context to the Scaling variant.
  3. Improve the UnableToInferFrequency variant to avoid potentially large error messages.

Addressing these points will further enhance the quality and usability of this error handling mechanism.

crates/augurs-prophet/src/prophet/options.rs (6)

1-9: LGTM: Clear module documentation and appropriate imports.

The file header provides a concise explanation of the module's purpose, and the imports are relevant to the implementation of Prophet model options.


11-30: LGTM: Well-defined GrowthType enum with appropriate conversion.

The GrowthType enum is clearly defined with three variants, and the conversion to TrendIndicator is implemented correctly.


84-131: LGTM: Well-implemented IntervalWidth struct with appropriate constraints.

The IntervalWidth struct is correctly implemented as a newtype wrapper around f64, with proper validation in the try_new method. The various trait implementations provide a good API for working with IntervalWidth.

This implementation addresses the previous TODO comment about creating a newtype wrapper for interval_width, ensuring that only valid values are used and enhancing the robustness of the code.


265-406: LGTM: Well-defined ProphetOptions struct with appropriate defaults.

The ProphetOptions struct is clearly defined with all necessary options for the Prophet model. The Default implementation provides sensible default values that align with the Python implementation of Prophet. The documentation for each field is comprehensive and helpful.


1-406: Overall: Well-implemented Prophet options with room for minor improvements.

This file provides a comprehensive and well-structured implementation of options for the Prophet model in Rust. The code is generally clear, well-documented, and follows Rust best practices. The suggested improvements, such as implementing Default for some enums and simplifying the From implementation, will further enhance the code's quality and consistency.

Great job on creating a robust foundation for configuring Prophet models in Rust!


133-135: Consider the implications of removing OptProphetOptions.

The TODO comment suggests removing this struct. While OptProphetOptions provides flexibility in specifying Prophet options, its removal could simplify the API. However, this decision depends on how the struct is used throughout the codebase and by external users.

Before making a decision, it would be helpful to analyze the usage of OptProphetOptions:

Consider the impact on API users and whether the flexibility provided by OptProphetOptions is necessary for your use cases.

crates/augurs-prophet/src/prophet/predict.rs (2)

1-118: Well-structured imports and data definitions

The imports are appropriate for the functionality, and the data structures (FeaturePrediction, FeaturePredictions, Predictions, IncludeHistory, and PosteriorPredictiveSamples) are well-defined and documented. This provides a solid foundation for the Prophet forecasting model implementation.


1-821: Summary of Prophet predict.rs review

Overall, the implementation of the Prophet forecasting model in this file is well-structured and comprehensive. Key strengths include:

  1. Clear and well-documented data structures
  2. Comprehensive implementation of various prediction methods
  3. Robust test coverage

Areas for improvement include:

  1. Addressing TODO comments, particularly in piecewise_linear and sample_posterior_predictive
  2. Optimizing matrix multiplication in predict_feature
  3. Enhancing test coverage with property-based tests and edge cases

Addressing these points will further improve the reliability and performance of the Prophet model implementation.

crates/augurs-prophet/src/error.rs Show resolved Hide resolved
crates/augurs-prophet/src/error.rs Show resolved Hide resolved
crates/augurs-prophet/src/prophet/options.rs Show resolved Hide resolved
crates/augurs-prophet/src/prophet/predict.rs Show resolved Hide resolved
crates/augurs-prophet/src/prophet/predict.rs Show resolved Hide resolved
crates/augurs-prophet/src/prophet/predict.rs Show resolved Hide resolved
crates/augurs-prophet/src/prophet/predict.rs Show resolved Hide resolved
crates/augurs-prophet/src/prophet/predict.rs Show resolved Hide resolved
Copy link
Contributor

coderabbitai bot commented Oct 10, 2024

✅ Actions performed

Comments resolved.

@sd2k
Copy link
Collaborator Author

sd2k commented Oct 10, 2024

@coderabbitai full review

Copy link
Contributor

coderabbitai bot commented Oct 10, 2024

✅ Actions performed

Full review triggered.

1 similar comment
Copy link
Contributor

coderabbitai bot commented Oct 10, 2024

✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 23

🧹 Outside diff range and nitpick comments (36)
.gitignore (1)

3-3: LGTM! Consider sorting entries for consistency.

The addition of .bacon-locations to the .gitignore file is appropriate and helps keep the repository clean of unnecessary files. This is consistent with good version control practices.

For improved readability and consistency, consider sorting the entries alphabetically. Here's a suggested order:

.bacon-locations
.vscode
/Cargo.lock
/target

This minor change would make it easier to scan the file for specific entries in the future.

crates/augurs-outlier/Cargo.toml (1)

Line range hint 15-16: Consider using workspace versions for rayon and rustc-hash.

While most dependencies have been updated to use workspace versions, rayon and rustc-hash still use specific versions:

rayon = { version = "1.10.0", optional = true }
rustc-hash = "2.0.0"

For consistency, consider moving these to workspace versions as well, if appropriate for your project structure.

If there's a specific reason these need to remain as they are, it might be helpful to add a comment explaining why.

Also applies to: 20-21

crates/augurs-ets/Cargo.toml (1)

Consistency in Dependency Configurations

The verification confirms that some dependencies (e.g., lstsq, nalgebra, rand_distr) still specify exact versions instead of using workspace configurations. To maintain consistency and simplify dependency management across the workspace, consider migrating these dependencies to use workspace-level configurations where feasible.

🔗 Analysis chain

Line range hint 1-45: Consider using workspace configurations for all dependencies

While most dependencies now use workspace-level configurations, a few (lstsq, nalgebra, rand_distr) still specify exact versions. To maintain consistency and simplify future updates, consider moving these remaining dependencies to workspace configurations as well, if possible.

To verify the current state of dependency management across the workspace, you can run the following script:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for inconsistencies in dependency versions across the workspace

# Test: List all Cargo.toml files and their dependency declarations
fd Cargo.toml | xargs -I {} sh -c 'echo "File: {}"; sed -n "/^\[dependencies\]/,/^\[/p" {} | grep -v "^\[dependencies\]" | grep -v "^\[" | sort'

# Note: Review the output to identify any inconsistencies in version specifications or usage of workspace configurations

Length of output: 4705

crates/augurs/src/lib.rs (1)

31-32: LGTM! Consider adding a brief doc comment.

The new import for augurs_prophet is well-placed and follows the existing pattern in the file. It's correctly feature-gated and re-exported, which is consistent with other modules.

Consider adding a brief doc comment above this import, similar to the other modules, to provide a quick description of what the prophet module does. For example:

/// Time series forecasting using the Prophet algorithm
#[cfg(feature = "prophet")]
pub use augurs_prophet as prophet;
crates/augurs/Cargo.toml (1)

37-37: LGTM: New feature correctly added, consider including in full feature

The new prophet feature is correctly defined and follows the project's naming conventions. It properly enables the augurs-prophet dependency when activated.

Consider adding the prophet feature to the full feature list on line 33 if it should be included when all features are enabled. If this exclusion is intentional, you may want to add a comment explaining why.

Cargo.toml (4)

38-38: LGTM: Addition of chrono dependency

The addition of the chrono dependency is appropriate for handling time-related operations, which are crucial for time series forecasting in the Prophet model.

Consider using a caret (^) version requirement (e.g., ^0.4.38) to allow for compatible updates within the 0.4.x range. This can help keep the dependency up-to-date with bug fixes and minor improvements.


41-42: LGTM: Addition of num-traits, rand, and statrs dependencies

The addition of num-traits, rand, and statrs dependencies is appropriate for implementing the Prophet forecasting model:

  • num-traits provides useful traits for generic numeric operations.
  • rand is necessary for sampling operations in the model.
  • statrs offers statistical computing capabilities essential for the implementation.

Consider using caret (^) version requirements for these dependencies as well (e.g., ^0.2.19, ^0.8.5, ^0.17.1) to allow for compatible updates within their respective major versions. This approach can help keep the dependencies up-to-date with bug fixes and minor improvements while maintaining compatibility.

Also applies to: 45-45


54-54: LGTM: Addition of pretty_assertions dependency

The addition of the pretty_assertions dependency is a good choice for improving the readability of test failure messages. Its placement in the dev-dependencies section is correct, as it's only needed for testing.

Consider using a caret (^) version requirement (e.g., ^1.4.1) to allow for compatible updates within the 1.x.x range. This can help keep the dependency up-to-date with bug fixes and minor improvements while maintaining compatibility.


Line range hint 34-54: Summary: Dependencies added for Prophet algorithm implementation

The changes to Cargo.toml align well with the PR objective of implementing the Prophet forecasting model in the augurs-prophet crate. The new dependencies (augurs-prophet, chrono, num-traits, rand, statrs, and pretty_assertions) provide the necessary tools for time series handling, statistical computations, and improved testing.

These additions enhance the project's capabilities for time series forecasting and are consistent with the implementation of a Prophet-like model in Rust.

To ensure long-term maintainability, consider implementing a dependency management strategy that includes:

  1. Regular updates of dependencies to benefit from bug fixes and performance improvements.
  2. A process for reviewing and testing dependency updates before merging them into the main branch.
  3. Using lockfiles (Cargo.lock) to ensure reproducible builds across different environments.
crates/augurs-prophet/README.md (4)

10-20: LGTM: Good overview of optimizers with a minor suggestion

This section effectively introduces the concept of optimizers and their role in the Prophet library. The explanation of the Optimizer and Sampler traits provides a clear picture of the crate's architecture.

Consider adding a brief explanation of why optimization is crucial for time series forecasting. This could help readers who are not familiar with the concept to understand its importance in the context of Prophet.


22-61: LGTM: Comprehensive overview of optimization approaches

This section provides an excellent breakdown of potential optimization strategies, demonstrating a thorough consideration of various environments and use cases. The explanations are clear and highlight the trade-offs of each approach.

To improve clarity, consider adding a brief summary table at the end of this section that compares the four approaches based on key factors such as ease of implementation, compatibility with different environments, and potential performance. This could help readers quickly grasp the pros and cons of each method.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~42-~42: Possible missing comma found.
Context: ...pass data directly to it from Rust. In theory this should work OK for any target whic...

(AI_HYDRA_LEO_MISSING_COMMA)


69-72: LGTM: Useful reference links provided

The links section provides valuable resources for readers to learn more about the technologies mentioned in the README.

Consider adding a brief description next to each link to give readers a quick overview of what they can expect to find at each resource. For example:

[Prophet]: https://facebook.github.io/prophet/ "Facebook's time series forecasting tool"
[Stan]: https://mc-stan.org/ "Statistical modeling and high-performance statistical computation"
[cxx]: https://cxx.rs/ "Safe interop between Rust and C++"
[WASM component]: https://component-model.bytecodealliance.org/ "WebAssembly Component Model"

42-42: Optional: Consider adding a comma for improved readability

While grammatically correct, you might consider adding a comma after "In theory" to slightly improve readability:

-In theory this should work OK for any target which Stan can compile to.
+In theory, this should work OK for any target which Stan can compile to.

This is a minor stylistic suggestion and is entirely optional.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~42-~42: Possible missing comma found.
Context: ...pass data directly to it from Rust. In theory this should work OK for any target whic...

(AI_HYDRA_LEO_MISSING_COMMA)

bacon.toml (4)

1-10: LGTM! Consider adding a brief description of the project's specific usage.

The header comments and default configuration are well-structured and informative. They provide useful links for further information about the bacon tool.

Consider adding a brief comment describing how this configuration is specifically used in your project, which could help new contributors understand its purpose more quickly.


48-68: LGTM! Consider adding a comment explaining package exclusions in doc-test.

The test and doc-test configurations are well-structured and use appropriate tools and options.

For the doc-test job, consider adding a brief comment explaining why augurs-js and pyaugurs packages are excluded. This would help other developers understand the reasoning behind these exclusions.

Example:

[jobs.doc-test]
command = [
  "cargo", "test", "--doc", "--color", "always",
  "--all-features", "--workspace",
  # Exclude non-Rust packages from doc tests
  "--exclude", "augurs-js",
  "--exclude", "pyaugurs",
]

80-105: LGTM! Consider periodic checks with warnings disabled.

The run and ex job configurations are flexible and well-suited for development purposes.

While allowing warnings is useful during active development, consider adding a separate job or a periodic check where warnings are treated as errors. This can help catch and address warnings before they accumulate.

Example additional job:

[jobs.run-strict]
command = [
    "cargo", "run",
    "--color", "always",
]
need_stdout = true
allow_warnings = false
background = true

This allows for a stricter check when needed, while maintaining the flexibility of the current configuration.


106-113: LGTM! Consider adding a keybinding for the 'test' job.

The keybindings section is well-commented and provides useful shortcuts for common tasks.

For consistency and improved workflow, consider adding a keybinding for the 'test' job. This would complement the existing bindings for clippy and doc-test. For example:

[keybindings]
c = "job:clippy-all"
d = "job:doc-test"
t = "job:test"  # Add this line

This addition would provide quick access to all main verification tasks: linting, doc-tests, and regular tests.

README.md (1)

38-43: LGTM! New "Developing" section added.

The new section provides valuable information for developers about the project's development tools. The use of reference-style links is consistent with the document's style.

However, there's a minor improvement that can be made:

Consider updating the mention of bacon to use the reference link:

-Some of the tasks require `bacon`, which will also need to be installed separately.
+Some of the tasks require [`bacon`], which will also need to be installed separately.

This change will make the bacon text a clickable link, consistent with how just is referenced.

crates/augurs-prophet/src/util.rs (1)

3-72: Enhance Documentation with Usage Examples

The methods nanmin, nanmax, and nanmean have clear documentation comments. Including code examples demonstrating their usage would improve the usability of the API and assist users in understanding how to implement these methods effectively.

crates/augurs-prophet/src/features.rs (3)

17-23: Consider deriving PartialEq and Eq for Holiday

To enable equality comparisons of Holiday instances, consider deriving PartialEq and Eq. This can be useful for testing and when using Holiday in collections that require these traits.

Apply this diff:

-#[derive(Debug, Clone)]
+#[derive(Debug, Clone, PartialEq, Eq)]

138-143: Consider deriving PartialEq and Eq for Regressor

To enable equality comparisons of Regressor instances, consider deriving PartialEq and Eq.

Apply this diff:

-#[derive(Debug, Clone, Default)]
+#[derive(Debug, Clone, Default, PartialEq, Eq)]

179-186: Consider deriving PartialEq and Eq for Seasonality

To allow equality comparisons of Seasonality instances, consider deriving PartialEq and Eq. This can be helpful for testing and managing collections.

Apply this diff:

-#[derive(Debug, Clone)]
+#[derive(Debug, Clone, PartialEq, Eq)]
crates/augurs-prophet/src/optimizer.rs (3)

208-208: Remove leading underscore from _opts field in OptimizeCall

The field _opts in OptimizeCall is prefixed with an underscore, which suggests it's unused. Since this field stores optimization options and may be used in future tests or assertions, consider renaming it to opts for clarity.

Apply this diff:

 pub(crate) struct OptimizeCall {
     pub init: InitialParams,
     pub data: Data,
-    pub _opts: OptimizeOpts,
+    pub opts: OptimizeOpts,
 }

205-215: Restrict visibility of internal fields for better encapsulation

The call field in MockOptimizer and the fields in OptimizeCall are publicly accessible. If these fields are intended for internal use within tests, consider making them private to prevent unintended external access and to encapsulate internal state.

Apply this diff:

 pub(crate) struct OptimizeCall {
-    pub init: InitialParams,
-    pub data: Data,
-    pub opts: OptimizeOpts,
+    init: InitialParams,
+    data: Data,
+    opts: OptimizeOpts,
 }

 #[derive(Clone, Debug)]
 pub(crate) struct MockOptimizer {
-    pub call: RefCell<Option<OptimizeCall>>,
+    call: RefCell<Option<OptimizeCall>>,
 }

Provide public getter methods if external access is necessary for testing purposes.


97-127: Enhance documentation for fields in OptimizeOpts

Some fields in OptimizeOpts lack detailed documentation, which may lead to confusion about their purpose or acceptable values. Providing comprehensive descriptions will improve the usability of the API.

For example, consider expanding the documentation for the following fields:

  • jacobian: Explain when to set this option to true and its impact on optimization.
  • history_size: Specify the default value and how it affects the L-BFGS algorithm.
  • iter: Clarify the default iteration limit and how it interacts with convergence tolerances.
crates/augurs-prophet/src/data.rs (2)

55-61: Simplify error message construction in length checks.

In length validation, you manually create String objects for a_name and b_name. Since these are static strings, you can use string literals to avoid unnecessary allocations.

Update the error construction:

return Err(Error::MismatchedLengths {
    a_name: "ds",
    a: self.ds.len(),
    b_name: "cap",
    b: cap.len(),
});

Also applies to: 73-79, 221-227, 239-245


49-65: Add examples to documentation for with_cap methods.

While the with_cap methods have documentation, adding usage examples can help users understand how to use these methods effectively.

For example:

/// Add the cap for logistic growth.
///
/// # Errors
///
/// Returns an error if the lengths of `ds` and `cap` differ.
///
/// # Examples
///
/// ```rust
/// let training_data = TrainingData::new(ds, y)
///     .expect("Failed to create TrainingData")
///     .with_cap(cap)
///     .expect("Failed to add cap");
/// ```

The same can be applied to the with_floor methods and others.

Also applies to: 215-231

crates/augurs-prophet/src/prophet/options.rs (3)

136-138: Address the TODO comment regarding OptProphetOptions

The TODO suggests reconsidering the existence of OptProphetOptions. If it's deemed unnecessary or if its usage can be simplified, consider refactoring or removing it to streamline the codebase.

Would you like assistance in evaluating the necessity of OptProphetOptions or proposing a refactor?


393-393: Handle potential errors without using unwrap() in defaults

Using unwrap() on conversions like 0.8.try_into().unwrap() can lead to panics if the conversion fails. Although the values are constants and seem safe, it's considered best practice to handle potential errors explicitly to ensure robustness.

Consider using expect with a clear error message or handling the Result:

- changepoint_range: 0.8.try_into().unwrap(),
+ changepoint_range: PositiveFloat::try_from(0.8).expect("Valid positive float for changepoint_range"),

259-264: Initialize holidays_mode with a default value

Currently, holidays_mode is set to None, which may lead to ambiguity. Since it's intended to default to the value of seasonality_mode, consider initializing it explicitly to enhance clarity and prevent potential issues.

Apply this diff to set the default value:

- holidays_mode: None,
+ holidays_mode: Some(self.seasonality_mode),
crates/augurs-prophet/src/prophet.rs (1)

271-274: Implement MCMC estimation in fit method (as per TODO)

There is a TODO comment to implement MCMC estimation by calling sample when self.opts.estimation == EstimationMode::Mcmc. Implementing this feature would expand the model's capabilities to include MCMC estimation methods.

Would you like assistance in implementing this feature or opening a GitHub issue to track this task?

crates/augurs-prophet/src/prophet/predict.rs (1)

810-812: Clarify Assertion Messages in Tests

In the predict_absmax test, the assertion messages can be improved for clarity. The current messages don't accurately reflect the conditions being checked.

At lines 810-812:

assert!(
    lower_bound <= point_estimate,
    "Lower bound should be less than the point estimate"
);

Since the assertion is lower_bound <= point_estimate, the message should state "Lower bound should be less than or equal to the point estimate."

Similarly, at lines 814-816:

assert!(
    upper_bound >= point_estimate,
    "Upper bound should be greater than the point estimate"
);

It should state "Upper bound should be greater than or equal to the point estimate."

Apply the following diff to correct the assertion messages:

 assert!(
     lower_bound <= point_estimate,
-    "Lower bound should be less than the point estimate"
+    "Lower bound should be less than or equal to the point estimate"
 );

 // ...

 assert!(
     upper_bound >= point_estimate,
-    "Upper bound should be greater than the point estimate"
+    "Upper bound should be greater than or equal to the point estimate"
 );

Also applies to: 814-816

crates/augurs-prophet/src/prophet/prep.rs (4)

552-555: Clarify transposition in Fourier series computation

The comment mentions that the function computes the transpose of the function in the Python code for simplicity. This might be confusing for readers who are not familiar with the Python implementation.

Consider rephrasing the comment for clarity:

-/// Note: this computes the transpose of the function in the Python
-/// code for simplicity, since we need it in a columnar format anyway.
+/// Note: this implementation computes the Fourier series components in a columnar format,
+/// as opposed to the row-wise format used in the original Python implementation,
+/// to align with how the data is stored in Rust.

767-768: Add an explanatory comment for adding a dummy feature

When no features are present, a dummy feature is added to prevent an empty features matrix. Adding a brief comment explaining the rationale can enhance code readability.

Add the following comment:

// Add a dummy feature to avoid an empty features matrix when no other features are present.

472-477: Simplify handle_seasonality_opt match arms

The match expression in handle_seasonality_opt can be simplified by combining the first two arms, as they return the same value.

Apply this diff to simplify the match expression:

 match opt {
-    SeasonalityOption::Auto if self.seasonalities.contains_key(name) => None,
-    SeasonalityOption::Auto if auto_disable => None,
-    SeasonalityOption::Auto | SeasonalityOption::Manual(true) => Some(default_order),
+    SeasonalityOption::Auto if self.seasonalities.contains_key(name) || auto_disable => None,
+    SeasonalityOption::Auto | SeasonalityOption::Manual(true) => Some(default_order),

Line range hint 1692-1694: Remove unreachable code

In the test module, the code at lines 1692-1694 seems unreachable or redundant. Removing such code can clean up the tests and prevent confusion.

Review and remove any unreachable or redundant code in the test module.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 253f6d9 and 37bb836.

⛔ Files ignored due to path filters (1)
  • crates/augurs-prophet/data/daily_univariate_ts.csv is excluded by !**/*.csv
📒 Files selected for processing (26)
  • .gitignore (1 hunks)
  • Cargo.toml (2 hunks)
  • README.md (2 hunks)
  • bacon.toml (1 hunks)
  • crates/augurs-ets/Cargo.toml (1 hunks)
  • crates/augurs-outlier/Cargo.toml (1 hunks)
  • crates/augurs-prophet/Cargo.toml (1 hunks)
  • crates/augurs-prophet/LICENSE-APACHE (1 hunks)
  • crates/augurs-prophet/LICENSE-MIT (1 hunks)
  • crates/augurs-prophet/README.md (1 hunks)
  • crates/augurs-prophet/src/data.rs (1 hunks)
  • crates/augurs-prophet/src/error.rs (1 hunks)
  • crates/augurs-prophet/src/features.rs (1 hunks)
  • crates/augurs-prophet/src/lib.rs (1 hunks)
  • crates/augurs-prophet/src/optimizer.rs (1 hunks)
  • crates/augurs-prophet/src/positive_float.rs (1 hunks)
  • crates/augurs-prophet/src/prophet.rs (1 hunks)
  • crates/augurs-prophet/src/prophet/options.rs (1 hunks)
  • crates/augurs-prophet/src/prophet/predict.rs (1 hunks)
  • crates/augurs-prophet/src/prophet/prep.rs (1 hunks)
  • crates/augurs-prophet/src/testdata.rs (1 hunks)
  • crates/augurs-prophet/src/util.rs (1 hunks)
  • crates/augurs-seasons/Cargo.toml (1 hunks)
  • crates/augurs/Cargo.toml (2 hunks)
  • crates/augurs/src/lib.rs (1 hunks)
  • justfile (1 hunks)
🧰 Additional context used
📓 Learnings (3)
crates/augurs-prophet/src/data.rs (1)
Learnt from: sd2k
PR: grafana/augurs#118
File: crates/augurs-prophet/src/data.rs:114-131
Timestamp: 2024-10-08T16:34:48.833Z
Learning: In `crates/augurs-prophet/src/data.rs`, test methods such as `head` and `tail` do not require validation checks on their parameters, as they are used for testing purposes.
crates/augurs-prophet/src/lib.rs (1)
Learnt from: sd2k
PR: grafana/augurs#118
File: crates/augurs-prophet/src/lib.rs:14-14
Timestamp: 2024-10-08T16:07:31.990Z
Learning: The `optimizer` module in `crates/augurs-prophet/src/optimizer.rs` already has inline module-level documentation.
crates/augurs-prophet/src/testdata.rs (2)
Learnt from: sd2k
PR: grafana/augurs#118
File: crates/augurs-prophet/src/testdata.rs:39-44
Timestamp: 2024-10-08T16:09:10.904Z
Learning: In Rust test code with hardcoded data, such as in the `load_csv` function in `crates/augurs-prophet/src/testdata.rs`, it's acceptable to use `unwrap()` methods, and handling errors gracefully is less critical.
Learnt from: sd2k
PR: grafana/augurs#118
File: crates/augurs-prophet/src/testdata.rs:14-19
Timestamp: 2024-10-08T16:08:23.652Z
Learning: When using `NaiveDate::and_hms_opt(0, 0, 0)` with hardcoded values for hour, minute, and second, it's safe to use `unwrap()` because the method will always return `Some`.
🪛 LanguageTool
crates/augurs-prophet/README.md

[uncategorized] ~42-~42: Possible missing comma found.
Context: ...pass data directly to it from Rust. In theory this should work OK for any target whic...

(AI_HYDRA_LEO_MISSING_COMMA)

🔇 Additional comments (42)
crates/augurs-prophet/Cargo.toml (3)

1-10: LGTM: Package metadata is well-configured.

The package metadata is correctly set up, using workspace-level configurations for most fields. This approach ensures consistency across the workspace and simplifies maintenance.


18-21: Dev-dependencies are well-configured.

The dev-dependencies are appropriately set up using workspace-level version specifications, maintaining consistency with the main dependencies. The chosen libraries are suitable for testing purposes:

  1. augurs-testing: likely a custom testing framework for the augurs project
  2. chrono: useful for handling dates and times in tests
  3. pretty_assertions: improves readability of test failure messages

11-16: Dependencies look appropriate, but verify necessity.

The dependencies are well-configured using workspace-level version specifications, which is good for consistency. The chosen libraries seem appropriate for a statistical forecasting implementation.

However, it's always a good practice to verify that each dependency is necessary:

  1. itertools: for advanced iterator operations
  2. num-traits: for numeric type traits
  3. rand: for random number generation
  4. statrs: for statistical distributions and functions
  5. thiserror: for simplified error handling

Please confirm that all these dependencies are actively used in the implementation. You can use the following script to check for unused dependencies:

✅ Verification successful

All listed dependencies are actively used and appropriately configured.

The verification confirms that itertools, num-traits, rand, statrs, and thiserror are all utilized within the codebase, ensuring their necessity.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potentially unused dependencies

# Test: Search for usage of each dependency
echo "Checking for usage of dependencies:"
for dep in itertools num-traits rand statrs thiserror; do
    echo "Searching for $dep usage:"
    rg --type rust "\b$dep::" crates/augurs-prophet/src
    echo
done

Length of output: 2047

crates/augurs-seasons/Cargo.toml (2)

Line range hint 2-9: Approve: Good use of workspace declarations for package metadata and dependencies

The changes to use workspace declarations for package metadata (version, authors, documentation, repository, license, edition, keywords) and dependencies (num-traits) are beneficial for the project. This approach:

  1. Centralizes version control for dependencies and metadata across the workspace.
  2. Ensures consistency across different crates in the project.
  3. Simplifies maintenance by allowing updates to be made in one place.

These changes align well with Rust best practices for workspace management.

Also applies to: 17-17


17-17: Verify the impact of the num-traits dependency change

The num-traits dependency has been changed from a specific version ("0.2.18") to a workspace declaration (workspace = true). While this is generally a good practice for consistency, it's important to ensure that this change doesn't introduce any compatibility issues with other parts of the project that might depend on a specific version of num-traits.

To verify this change, please run the following script:

This script will help ensure that the change to use the workspace declaration for num-traits is consistent across the project and doesn't introduce any conflicts.

justfile (1)

21-22: Add a comment explaining the watch target and address documentation concerns.

Thank you for adding the watch target. To improve clarity and maintainability:

  1. Please add a comment above the watch target explaining its purpose and what the bacon command does.
  2. Ensure that the concerns raised in the previous review comment about documenting the bacon tool are addressed, either by updating the project's README or by adding setup instructions for the tool.

Example comment:

# Watch for changes and automatically rebuild (requires 'bacon' to be installed)
watch:
  bacon
crates/augurs-outlier/Cargo.toml (2)

22-22: Clarify the configuration for the rv dependency.

The rv dependency is configured with a specific version and has default-features = false:

rv = { version = "0.17.0", default-features = false }

Could you provide some context on why this dependency isn't using a workspace version and why the default features are disabled? This information would be helpful for maintaining consistency and understanding the project's requirements.

To check if this configuration is consistent across the workspace, we can run:

#!/bin/bash
# Description: Check for rv dependency configuration across the workspace

# Test: Search for rv dependency specifications
rg --type toml '^\s*rv\s*=' crates/*/Cargo.toml

This will help us understand if this configuration is unique to this crate or consistent across the workspace.


Line range hint 1-28: LGTM! Excellent use of workspace dependencies.

The changes to use workspace dependencies for most of the package's dependencies and metadata fields are a great improvement. This approach ensures consistency across the workspace and simplifies dependency management. It's particularly beneficial for larger projects with multiple crates.

To ensure all relevant dependencies are using workspace versions, let's run the following check:

This will help identify any remaining hardcoded versions that might benefit from being moved to workspace dependencies.

crates/augurs-ets/Cargo.toml (2)

Line range hint 1-45: LGTM: Well-structured Cargo.toml file with consistent use of workspace configurations

The overall structure of the Cargo.toml file is well-organized and follows Rust package management best practices. The consistent use of workspace-level configurations for most fields and dependencies is a good practice that can improve project-wide consistency and ease maintenance.


19-19: LGTM: Consistent update of rand dependency to use workspace configuration

The change to use rand.workspace = true aligns with the project's approach to centralized dependency management. This update ensures version consistency for the rand crate across the project and simplifies future updates.

crates/augurs-prophet/LICENSE-MIT (2)

Line range hint 2-20: LGTM: MIT License added

The addition of the MIT License file is appropriate and follows best practices for open-source projects. This license grants users the necessary permissions to use, modify, and distribute the software while providing protection for the authors.


1-1: Clarify the purpose of the first line

The first line ../../LICENSE-MIT seems to reference a license file two directories up from the current location. This is unusual in a license file and may cause confusion.

To verify the purpose of this line and ensure it's not causing any issues, please run the following script:

Based on the results, consider removing or commenting out this line if it's not serving a specific purpose.

✅ Verification successful

Verification Successful

The crates/augurs-prophet/LICENSE-MIT file does not reference ../../LICENSE-MIT, and no such file exists. The license file is correctly formatted without any confusing references.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if there's another LICENSE-MIT file two directories up and compare its content with this file.

# Test 1: Check if the referenced file exists
if [ -f ../../LICENSE-MIT ]; then
  echo "The file ../../LICENSE-MIT exists."
  
  # Test 2: Compare the content of both files
  if diff ../../LICENSE-MIT crates/augurs-prophet/LICENSE-MIT > /dev/null; then
    echo "The content of both files is identical."
  else
    echo "The content of the files differs."
    echo "Differences:"
    diff ../../LICENSE-MIT crates/augurs-prophet/LICENSE-MIT
  fi
else
  echo "The file ../../LICENSE-MIT does not exist."
fi

# Test 3: Check if this line is causing any issues in the license file
if grep -q "^../../LICENSE-MIT$" crates/augurs-prophet/LICENSE-MIT; then
  echo "The line '../../LICENSE-MIT' is present at the beginning of the file."
  echo "This might cause issues with license recognition tools."
else
  echo "The line '../../LICENSE-MIT' is not present at the beginning of the file."
fi

Length of output: 354

crates/augurs/Cargo.toml (1)

20-20: LGTM: New dependency correctly added

The new optional dependency augurs-prophet is correctly added and follows the project's conventions. It's marked as optional and uses the workspace version, which is consistent with other dependencies in this file.

Cargo.toml (1)

34-34: LGTM: Addition of augurs-prophet dependency

The addition of the augurs-prophet dependency is consistent with the PR's objective of implementing the Prophet forecasting model. The version (0.3.1) matches the workspace package version, and the path correctly points to a local crate.

crates/augurs-prophet/README.md (2)

1-8: LGTM: Clear introduction and goal statement

The introduction effectively communicates the purpose of the crate and its goal of being low-dependency. This aligns well with the PR objectives and provides valuable information for potential users.


63-67: LGTM: Appropriate credits and implementation notes

The Credits section properly acknowledges the original Prophet Python package while highlighting the Rust-specific adaptations. This transparency is valuable for users familiar with the Python version and sets appropriate expectations for the Rust implementation.

bacon.toml (4)

69-79: LGTM! Documentation job configurations are well-structured.

The doc and doc-open job configurations are appropriate and well-thought-out:

  • Generating documentation without dependencies is a good practice for focused project docs.
  • The doc-open job's "back" behavior on success prevents unnecessary browser openings, which is user-friendly.

1-113: Overall, the bacon.toml configuration is well-structured and comprehensive.

This configuration file provides a robust setup for the bacon tool, covering various aspects of the development workflow including checking, linting, testing, documentation, and running the application. The file is well-organized and includes helpful comments throughout.

Key strengths:

  1. Clear separation of different job types (check, clippy, test, doc, run).
  2. Use of appropriate tools like nextest for improved test performance.
  3. Thoughtful configuration of documentation jobs.
  4. Flexible setup for running the application and examples.
  5. Custom keybindings for quick access to common tasks.

The suggestions provided in the review comments are minor improvements that could further enhance the configuration's clarity and effectiveness. These include considering stdout handling, documenting lint customization processes, explaining package exclusions, and adding a keybinding for the test job.

Overall, this bacon.toml file serves as a solid foundation for managing the project's development workflow.


11-18: LGTM! Consider the implications of suppressing stdout.

The "check" and "check-all" job configurations are well-defined and use appropriate cargo commands.

While suppressing stdout can reduce noise, it might hide useful information. Consider if there are scenarios where stdout output would be beneficial. You can verify the impact with this script:


19-46: LGTM! Consider enabling stdout for Clippy and documenting lint customization process.

The Clippy job configurations are well-structured and provide options for running on default and all targets.

  1. Consider enabling stdout for Clippy jobs to capture all warnings and suggestions:
-need_stdout = false
+need_stdout = true
  1. The commented-out examples for disabling lints are helpful. Consider adding a brief comment on when and how to customize lints for your project, or link to a lint policy document if one exists.

To ensure Clippy isn't suppressing important information, run this script:

README.md (3)

32-32: LGTM! New crate description added.

The addition of the augurs-prophet crate description is well-formatted and consistent with other entries in the table. The "alpha" status is appropriate for a new crate.


84-84: LGTM! Reference link for augurs-prophet added.

The new reference link for augurs-prophet is correctly formatted and consistent with other crate reference links in the document. It points to the appropriate crates.io page for the new crate.


88-89: LGTM! Reference links for just and bacon added.

The new reference links for just and bacon are correctly formatted and point to the appropriate documentation pages for each tool.

Note: As mentioned in a previous review comment, the bacon reference is currently unused in the document. Please refer to the earlier suggestion to update the mention of bacon in the "Developing" section to use this reference link.

crates/augurs-prophet/LICENSE-APACHE (1)

Line range hint 1-220: LGTM: Apache License 2.0 added to the project.

The inclusion of the Apache License 2.0 is appropriate for open-source projects. It clearly defines the terms and conditions for using, modifying, and distributing the software, which is crucial for legal compliance and user understanding.

crates/augurs-prophet/src/lib.rs (4)

1-7: Crate-level documentation and compiler warnings are well-configured

Including the README as crate-level documentation and setting up compiler warnings helps maintain code quality and provides users with essential information.


8-18: Module organization is clean and extensible

The modules are well-organized, and publicly exporting the optimizer module allows users to implement custom optimizers, enhancing the flexibility of the library.


20-21: Type alias TimestampSeconds is properly documented

Defining TimestampSeconds with clear documentation improves code readability and consistency when handling timestamps.


23-37: Re-exporting key types simplifies the library interface

Re-exporting essential types and structures at the root level ensures that users have convenient access without navigating through the module hierarchy.

crates/augurs-prophet/src/testdata.rs (2)

16-16: Confirm and_hms_opt(0, 0, 0).unwrap() is safe in this context

In the timestamp method, using unwrap() on and_hms_opt(0, 0, 0) assumes that the operation will always succeed. Given that the hours, minutes, and seconds are hardcoded to valid values, this should be safe. However, if there's any possibility of NaiveDate being invalid or manipulated, consider handling the None case to prevent potential panics.

You mentioned in previous learnings that it's safe to use unwrap() here due to hardcoded values. If you're confident this function will not receive invalid dates, no action is required.


33-46: Optional: Handle potential errors when loading CSV data

The load_csv function uses multiple unwrap() calls that could cause panics if the file is missing or the data is malformed. Since this is test code with hardcoded data, and based on previous learnings, it's acceptable in this context. However, if you plan to use this function with external or dynamically provided data in the future, consider adding proper error handling.

crates/augurs-prophet/src/error.rs (3)

31-32: Fix the syntax in the error message for TooManyDataPoints.

The #[error] macro cannot evaluate expressions like i32::MAX inside format strings. This issue was identified in a previous review and still needs to be addressed.


83-84: Limit the amount of data included in the UnableToInferFrequency error message.

Including the entire vector of timestamps {0:?} in the error message could lead to excessively large messages for large datasets. Consider summarizing the timestamps or displaying a range instead.


6-85: Well-defined error variants and messages.

The Error enum is well-structured, and the error messages are clear and informative, providing helpful context for users of the Prophet forecasting algorithm.

crates/augurs-prophet/src/util.rs (1)

1-2: 🛠️ Refactor suggestion

Verify Necessity of FromPrimitive in Trait Bounds

It's worth verifying whether the FromPrimitive trait is strictly necessary in the trait bounds, as this may already be covered by the Float trait.

To check if FromPrimitive is required, you can search for usages of FromPrimitive methods:

If FromPrimitive is only used implicitly through the Float trait, you might be able to simplify the trait bounds by removing it.

✅ Verification successful

Confirmed Use of FromPrimitive in Trait Bounds

The FromPrimitive trait is explicitly used in the trait bounds within crates/augurs-prophet/src/util.rs, ensuring necessary functionality that is not covered by the Float trait alone.

  • File: crates/augurs-prophet/src/util.rs
    • Trait FloatIterExt and its implementation include FromPrimitive alongside Float.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for direct usages of `FromPrimitive` methods in the codebase.

rg --type rust 'FromPrimitive' crates/augurs-prophet/src/

Length of output: 356

crates/augurs-prophet/src/features.rs (2)

42-53: Consider adding an overload method to accept a single i32 for with_lower_window

A previous review comment suggested adding an overload method that accepts a single i32 value for with_lower_window. This allows users to set the same lower_window value for all dates without providing a Vec<i32> matching the length of ds. This comment is still valid and applicable.


61-72: Consider adding an overload method to accept a single i32 for with_upper_window

A previous review comment suggested adding an overload method that accepts a single i32 value for with_upper_window. This allows users to set the same upper_window value for all dates without providing a Vec<i32> matching the length of ds. This comment is still valid and applicable.

crates/augurs-prophet/src/data.rs (1)

115-133: 🛠️ Refactor suggestion

Check for infinite values in regressor data.

Currently, the with_regressors methods check for NaN values but do not check for infinite values (f64::INFINITY or f64::NEG_INFINITY). To ensure data integrity, consider adding checks for infinite values as well.

Update the condition to check for both NaN and infinite values:

if reg.iter().any(|x| x.is_nan() || x.is_infinite()) {
    return Err(Error::InvalidValue {
        column: name.clone(),
    });
}

You'll also need to update the Error enum to reflect a more general invalid value:

pub enum Error {
    // existing variants...
-   NaNValue {
+   InvalidValue {
        column: String,
    },
    // ...
}

Also applies to: 281-299

crates/augurs-prophet/src/prophet/options.rs (1)

72-85: Good use of #[non_exhaustive] for future-proofing EstimationMode

Marking the EstimationMode enum as non_exhaustive anticipates future additions such as MCMC sampling. This is a proactive approach that enhances the extensibility of the code.

crates/augurs-prophet/src/prophet/predict.rs (1)

555-576: ⚠️ Potential issue

Potential Licensing Issue with Copied Code in percentile_of_sorted Function

At lines 555-556, the function percentile_of_sorted is copied from the Rust compiler's test suite:

// Taken from the Rust compiler's test suite:
// https://github.com/rust-lang/rust/blob/917b0b6c70f078cb08bbb0080c9379e4487353c3/library/test/src/stats.rs#L258-L280.

Copying code from external projects may introduce licensing issues, depending on the licenses of the Rust project and your project's license. The Rust compiler is dual-licensed under Apache 2.0 and MIT licenses.

Please ensure that including this code complies with your project's licensing requirements.

To verify the licensing terms of the Rust compiler's test suite, you can run the following script:

This script clones the Rust repository and displays the LICENSE-APACHE and LICENSE-MIT files. Review these licenses to ensure compliance. If needed, I can help rewrite the percentile_of_sorted function to avoid potential licensing issues.

✅ Verification successful

Licensing Compliance Verified

Including the percentile_of_sorted function from the Rust compiler's test suite complies with the project's licensing requirements under both the Apache License 2.0 and the MIT License.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the licensing terms of the Rust compiler's test suite.

# Clone the Rust repository (depth limited to save time).
git clone https://github.com/rust-lang/rust.git --depth 1 rust_repo

# Display the LICENSE files.
echo "Apache License 2.0:"
cat rust_repo/LICENSE-APACHE

echo "MIT License:"
cat rust_repo/LICENSE-MIT

Length of output: 13960

crates/augurs-prophet/src/prophet/prep.rs (3)

248-265: Consider handling the case when history.y_scaled is empty

In the data preparation for modeling, history.y_scaled is used directly. If y_scaled is empty, which might occur in certain edge cases, this could lead to issues during model fitting.

Ensure that history.y_scaled is checked for non-emptiness before proceeding. If necessary, add error handling to manage cases with insufficient data.


338-343: Check for division by zero when calculating cs

In the loop where cs is calculated, there is potential for division by zero if scales.y_scale is zero. Although scales.y_scale is adjusted earlier, it's safer to ensure that division by zero cannot occur here.

Run the following script to confirm that scales.y_scale is never zero before this calculation:

✅ Verification successful

Division by zero for scales.y_scale is properly handled

The code includes a check that sets scales.y_scale to 1.0 if it is zero before performing the division, effectively preventing any division by zero errors.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that scales.y_scale is not zero when calculating cs.

# Test: Search for assignments to scales.y_scale and ensure it's not zero.
rg --type rust --pcre2 "scales\.y_scale\s*=\s*[^;]+" -A 3 crates/augurs-prophet/src/prophet/prep.rs

Length of output: 971


308-322: Ensure regressors and conditions are sorted correctly

While sorting the data, ds and y are sorted based on ds, but the sorting of regressors and seasonality conditions relies on the order of sort_indices. It's crucial to ensure that all related data structures are correctly aligned after sorting.

Run the following script to verify that all regressors and conditions are sorted consistently with ds:

✅ Verification successful

Regressors and conditions are sorted correctly

Verified that regressors and seasonality conditions are sorted consistently with ds.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all regressors and conditions are sorted according to ds.

# Test: Check for usage of sort_indices in sorting regressors and conditions.
rg --type rust "for .* in .*\.values_mut\(\)" -A 5 crates/augurs-prophet/src/prophet/prep.rs

Length of output: 427

crates/augurs-prophet/LICENSE-APACHE Show resolved Hide resolved
crates/augurs-prophet/src/positive_float.rs Outdated Show resolved Hide resolved
crates/augurs-prophet/src/positive_float.rs Show resolved Hide resolved
crates/augurs-prophet/src/testdata.rs Show resolved Hide resolved
crates/augurs-prophet/src/testdata.rs Show resolved Hide resolved
Comment on lines +231 to +263
fn infer_freq(history_dates: &[TimestampSeconds]) -> Result<TimestampSeconds, Error> {
const INFER_N: usize = 5;
let get_tried = || {
history_dates
.iter()
.rev()
.take(INFER_N)
.copied()
.collect_vec()
};
// Calculate diffs between the last 5 dates in the history, and
// create a map from diffs to counts.
let diff_counts = history_dates
.iter()
.rev()
.take(INFER_N)
.tuple_windows()
.map(|(a, b)| a - b)
.counts();
// Find the max count, and return the corresponding diff, provided there
// is exactly one diff with that count.
let max = diff_counts
.values()
.copied()
.max()
.ok_or_else(|| Error::UnableToInferFrequency(get_tried()))?;
diff_counts
.into_iter()
.filter(|(_, v)| *v == max)
.map(|(k, _)| k)
.exactly_one()
.map_err(|_| Error::UnableToInferFrequency(get_tried()))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add input validation for sufficient data points in infer_freq

The infer_freq function may fail or return incorrect results when history_dates has fewer than two data points. Consider adding input validation to ensure that the dataset has enough points to infer the frequency reliably.

Apply this diff to add an explicit check:

 fn infer_freq(history_dates: &[TimestampSeconds]) -> Result<TimestampSeconds, Error> {
+    if history_dates.len() < 2 {
+        return Err(Error::NotEnoughData);
+    }
     const INFER_N: usize = 5;
     let get_tried = || {
         history_dates
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn infer_freq(history_dates: &[TimestampSeconds]) -> Result<TimestampSeconds, Error> {
const INFER_N: usize = 5;
let get_tried = || {
history_dates
.iter()
.rev()
.take(INFER_N)
.copied()
.collect_vec()
};
// Calculate diffs between the last 5 dates in the history, and
// create a map from diffs to counts.
let diff_counts = history_dates
.iter()
.rev()
.take(INFER_N)
.tuple_windows()
.map(|(a, b)| a - b)
.counts();
// Find the max count, and return the corresponding diff, provided there
// is exactly one diff with that count.
let max = diff_counts
.values()
.copied()
.max()
.ok_or_else(|| Error::UnableToInferFrequency(get_tried()))?;
diff_counts
.into_iter()
.filter(|(_, v)| *v == max)
.map(|(k, _)| k)
.exactly_one()
.map_err(|_| Error::UnableToInferFrequency(get_tried()))
}
fn infer_freq(history_dates: &[TimestampSeconds]) -> Result<TimestampSeconds, Error> {
if history_dates.len() < 2 {
return Err(Error::NotEnoughData);
}
const INFER_N: usize = 5;
let get_tried = || {
history_dates
.iter()
.rev()
.take(INFER_N)
.copied()
.collect_vec()
};
// Calculate diffs between the last 5 dates in the history, and
// create a map from diffs to counts.
let diff_counts = history_dates
.iter()
.rev()
.take(INFER_N)
.tuple_windows()
.map(|(a, b)| a - b)
.counts();
// Find the max count, and return the corresponding diff, provided there
// is exactly one diff with that count.
let max = diff_counts
.values()
.copied()
.max()
.ok_or_else(|| Error::UnableToInferFrequency(get_tried()))?;
diff_counts
.into_iter()
.filter(|(_, v)| *v == max)
.map(|(k, _)| k)
.exactly_one()
.map_err(|_| Error::UnableToInferFrequency(get_tried()))
}

Comment on lines +625 to +637
// prediction data.
if let Some(train_holidays) = &self.train_holiday_names {
all_holidays.retain(|name, _| train_holidays.contains(name));
let holidays_to_add = train_holidays
.iter()
.filter(|name| !all_holidays.contains_key(name.as_str()))
.collect_vec();
all_holidays.extend(
holidays_to_add
.into_iter()
.map(|name| (name.clone(), Holiday::new(vec![]))),
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure consistent handling of holidays during prediction

In construct_holidays, when predicting, holidays not seen during training are removed, and holidays seen in training but not in prediction data are added with empty dates. This could lead to unexpected behavior if the holiday definitions differ between training and prediction.

Consider adding checks to ensure that holiday definitions are consistent between training and prediction, and provide warnings or errors if discrepancies are found.

Comment on lines +168 to +172
new_features: impl Iterator<Item = Vec<f64>>,
) {
self.names.extend(names);
self.data.extend(new_features);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Validate alignment of feature names and data

In the extend method of FeaturesFrame, it's important to ensure that the iterators names and new_features produce the same number of items to maintain alignment between feature names and their corresponding data.

Consider adding a check to verify that both iterators have the same length, or handle cases where they might differ.

fn extend(
    &mut self,
    names: impl Iterator<Item = FeatureName>,
    new_features: impl Iterator<Item = Vec<f64>>,
) {
    let names_vec: Vec<_> = names.collect();
    let features_vec: Vec<_> = new_features.collect();

    assert_eq!(names_vec.len(), features_vec.len(), "Mismatch in number of feature names and data columns");

    self.names.extend(names_vec);
    self.data.extend(features_vec);
}

Comment on lines +429 to +431
if scales.y_scale == 0.0 {
scales.y_scale = 1.0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle zero y_scale explicitly

In the initialize_scales method, when scales.y_scale is zero, it's being set to 1.0. This could mask underlying data issues and lead to incorrect scaling of y values. Instead, consider returning an error to handle this case explicitly.

Apply this diff to return an error when scales.y_scale is zero:

 if scales.y_scale == 0.0 {
-    scales.y_scale = 1.0;
+    return Err(Error::ZeroYScale);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if scales.y_scale == 0.0 {
scales.y_scale = 1.0;
}
if scales.y_scale == 0.0 {
return Err(Error::ZeroYScale);
}

Comment on lines +590 to +606
// Handle conditions by setting feature values to zero where the condition is false.
if let Some(condition_name) = &seasonality.condition_name {
let col = history
.seasonality_conditions
.get(condition_name)
.ok_or_else(|| Error::MissingSeasonalityCondition(condition_name.clone()))?;
let false_indices = col
.iter()
.enumerate()
.filter_map(|(i, b)| if *b { None } else { Some(i) })
.collect_vec();
for feature in new_features.iter_mut() {
for idx in false_indices.iter().copied() {
feature[idx] = 0.0;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Optimize condition handling in seasonality features

In make_seasonality_features, the nested loops used to set feature values to zero based on conditions can be optimized for better performance, especially with large datasets. Instead of iterating over features and indices, consider using iterators and masking techniques to avoid nested loops.

Apply this diff to refactor the condition handling:

 if let Some(condition_name) = &seasonality.condition_name {
     let col = history
         .seasonality_conditions
         .get(condition_name)
         .ok_or_else(|| Error::MissingSeasonalityCondition(condition_name.clone()))?;
-    let false_indices = col
-        .iter()
-        .enumerate()
-        .filter_map(|(i, b)| if *b { None } else { Some(i) })
-        .collect_vec();
-    for feature in new_features.iter_mut() {
-        for idx in false_indices.iter().copied() {
-            feature[idx] = 0.0;
-        }
-    }
+    for (feature_row, &include) in feature.iter_mut().zip(col) {
+        if !include {
+            *feature_row = 0.0;
+        }
+    }
 }

Committable suggestion was skipped due to low confidence.

sd2k and others added 4 commits October 10, 2024 12:11
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
This makes it easier to pass by reference, since otherwise we're
using Vec<Vec<T>> which derefs to &[Vec<T>] not &[&[T]], and they
have different representations so can't be transmuted.
To make it easier to cast slices of them.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Outside diff range and nitpick comments (8)
crates/augurs-prophet/Cargo.toml (3)

11-18: LGTM: Dependencies are well-defined and use workspace versions.

The dependencies are correctly specified and use workspace-level versions, which is a good practice for maintaining consistency across the workspace. The optional bytemuck dependency with the "derive" feature is appropriately configured.

Consider adding brief comments explaining the purpose of each dependency, especially for less common ones like statrs or bytemuck. This can help other developers understand the role of each dependency in the project.


25-26: LGTM: Feature for optional dependency is correctly defined.

The "bytemuck" feature is properly defined, allowing users to opt-in to this functionality. This is consistent with the optional bytemuck dependency and follows Rust best practices for managing optional dependencies.

Consider adding a brief comment explaining the purpose of the "bytemuck" feature and when users might want to enable it. This can help users of your crate make informed decisions about which features to enable.


1-26: Overall, excellent Cargo.toml configuration for the augurs-prophet crate.

The Cargo.toml file is well-structured, follows Rust best practices, and makes good use of workspace-level definitions. The dependencies, dev-dependencies, and features are appropriately defined. The optional bytemuck dependency and corresponding feature demonstrate good project organization.

As the project grows, consider:

  1. Documenting the purpose of each dependency and feature in comments.
  2. Regularly reviewing and updating dependencies to ensure security and performance.
  3. If not already in place, consider setting up a workspace-wide version management strategy (e.g., using a tool like cargo-edit or cargo-update) to simplify dependency updates across all crates in the workspace.
crates/augurs-prophet/src/optimizer.rs (3)

1-23: Enhance module-level documentation for clarity and completeness

The current documentation provides a good overview, but consider the following improvements:

  1. Add a brief explanation of what the Prophet model is and its purpose.
  2. Clarify the current state of implementation (e.g., which features are currently available).
  3. Consider moving TODOs to a separate section or issue tracker for better visibility and tracking.

These changes will help users and contributors better understand the module's current state and future direction.


52-84: Adhere to Rust naming conventions for struct field names

The Data struct contains fields like T, S, and K that are capitalized. According to Rust naming conventions, field names should be in snake_case. Consider renaming these fields to improve code consistency and readability.

Example:

pub struct Data {
    pub num_time_periods: u32, // instead of T
    pub num_changepoints: u32, // instead of S
    pub num_regressors: u32,   // instead of K
    // ... other fields ...
}

Also, remove the #[allow(non_snake_case)] attribute once the field names are updated.


185-196: Enhance documentation for the Optimizer trait

While the Optimizer trait is well-defined, consider adding more detailed documentation to improve its usability:

  1. Provide a brief explanation of what the optimization process does in the context of the Prophet model.
  2. Document the expected behavior when optimization fails or encounters issues.
  3. Add examples of how to implement this trait for different optimization algorithms.

Example:

/// A type that can run maximum likelihood estimation optimization
/// for the Prophet model.
///
/// Implementations of this trait should provide methods to find
/// the optimal parameters for the Prophet model given initial
/// parameters and data.
///
/// # Examples
///
/// ```
/// // Add an example of how to implement and use this trait
/// ```
pub trait Optimizer: std::fmt::Debug {
    // ... existing method ...
}

This additional documentation will help users understand how to implement and use the Optimizer trait effectively.

crates/augurs-prophet/src/prophet.rs (1)

271-274: Consider creating an issue to track MCMC sampling implementation.

The TODO comment suggests implementing MCMC sampling for the EstimationMode::Mcmc case. This is an important enhancement that would expand the model's capabilities. Consider creating a GitHub issue to track this task and ensure it's not overlooked in future development.

Would you like me to create a GitHub issue for tracking the MCMC sampling implementation?

crates/augurs-prophet/src/prophet/prep.rs (1)

21-120: Consider adding more documentation to structs and enums

While the structs and enums are well-defined, adding more comprehensive documentation would improve the code's readability and maintainability. Consider adding doc comments (starting with ///) to explain the purpose and usage of each struct and enum, especially for complex types like ComponentColumns and FeatureName.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 37bb836 and 6f2ca8a.

📒 Files selected for processing (6)
  • Cargo.toml (2 hunks)
  • crates/augurs-prophet/Cargo.toml (1 hunks)
  • crates/augurs-prophet/src/optimizer.rs (1 hunks)
  • crates/augurs-prophet/src/positive_float.rs (1 hunks)
  • crates/augurs-prophet/src/prophet.rs (1 hunks)
  • crates/augurs-prophet/src/prophet/prep.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/augurs-prophet/src/positive_float.rs
🧰 Additional context used
🔇 Additional comments (10)
crates/augurs-prophet/Cargo.toml (2)

1-10: LGTM: Package metadata is well-defined and follows best practices.

The package metadata is correctly set up, using workspace-level definitions for most fields. This approach ensures consistency across the workspace and simplifies maintenance.


20-23: LGTM: Dev-dependencies are appropriate and use workspace versions.

The dev-dependencies are well-chosen for a forecasting library, including testing utilities and a date/time library. Using workspace-level versions maintains consistency across the project.

Cargo.toml (2)

34-34: LGTM: Addition of augurs-prophet dependency

The addition of the augurs-prophet dependency is consistent with the PR objective of implementing the Prophet forecasting model. The version (0.3.1) matches the workspace version, and the path correctly points to a local crate.


56-56: Great addition: pretty_assertions for improved testing

The addition of pretty_assertions = "1.4.1" is a positive change. This library provides more readable output for assert comparisons in tests, which can significantly improve the debugging experience when tests fail.

This addition demonstrates a commitment to maintaining and improving the project's test suite, which is commendable.

crates/augurs-prophet/src/prophet.rs (4)

1-18: LGTM: Well-structured module and import declarations.

The module structure and imports are well-organized, providing a clear separation of concerns for the Prophet time series forecasting implementation.


20-68: Acknowledge the existing TODO for struct refactoring.

The TODO comment on lines 32-35 suggests moving several fields into a separate struct. This refactoring could indeed help minimize the number of optional fields and reduce the risk of invalid states, improving the overall structure and maintainability of the code.


290-813: LGTM: Comprehensive test coverage.

The test modules provide extensive coverage of various aspects of the Prophet implementation, including trend initialization, seasonality, holidays, and fitting. The use of mock optimizers for testing is a good practice, allowing for isolated testing of the Prophet implementation.


231-263: 🛠️ Refactor suggestion

Add input validation for sufficient data points in infer_freq.

The infer_freq function may fail or return incorrect results when history_dates has fewer than two data points. Consider adding input validation to ensure that the dataset has enough points to infer the frequency reliably.

Apply this diff to add an explicit check:

 fn infer_freq(history_dates: &[TimestampSeconds]) -> Result<TimestampSeconds, Error> {
+    if history_dates.len() < 2 {
+        return Err(Error::NotEnoughData);
+    }
     const INFER_N: usize = 5;
     let get_tried = || {
         history_dates

Likely invalid or redundant comment.

crates/augurs-prophet/src/prophet/prep.rs (2)

1-20: Imports and constants look good!

The imports are well-organized, and the constants for time periods are clearly defined with appropriate names.


1-1486: Overall assessment: Good implementation with room for improvement

The prep.rs file provides a comprehensive implementation of preprocessing logic for the Prophet forecasting model. The code is generally well-structured and follows Rust best practices. However, there are several areas where improvements can be made:

  1. Code organization: Some methods, particularly preprocess, setup_dataframe, and initialize_scales, could benefit from being broken down into smaller, more focused helper methods.

  2. Error handling: Consider enhancing error checking, especially for infinite and NaN values in input data.

  3. Documentation: Add more comprehensive documentation to structs, enums, and complex methods to improve code readability and maintainability.

  4. Test coverage: Enhance the test suite with more organized test modules, additional error cases, edge cases, and parameterized tests.

  5. Performance: Some calculations, like the standard deviation for regressors, could be optimized for better accuracy.

Implementing these suggestions would significantly improve the overall quality, maintainability, and robustness of the code.

Cargo.toml Show resolved Hide resolved
crates/augurs-prophet/src/optimizer.rs Show resolved Hide resolved
Comment on lines +198 to +259
#[cfg(test)]
pub mod mock_optimizer {
use std::cell::RefCell;

use super::*;

#[derive(Debug, Clone)]
pub(crate) struct OptimizeCall {
pub init: InitialParams,
pub data: Data,
pub _opts: OptimizeOpts,
}

/// A mock optimizer that records the optimization call.
#[derive(Clone, Debug)]
pub(crate) struct MockOptimizer {
/// The optimization call.
///
/// This will be updated by the mock optimizer when
/// [`Optimizer::optimize`] is called.
// [`Optimizer::optimize`] takes self by reference,
// so we need to store the call in a RefCell.
pub call: RefCell<Option<OptimizeCall>>,
}

impl MockOptimizer {
/// Create a new mock optimizer.
pub(crate) fn new() -> Self {
Self {
call: RefCell::new(None),
}
}

/// Take the optimization call out of the mock.
pub(crate) fn take_call(&self) -> Option<OptimizeCall> {
self.call.borrow_mut().take()
}
}

impl Optimizer for MockOptimizer {
fn optimize(
&self,
init: &InitialParams,
data: &Data,
opts: &OptimizeOpts,
) -> Result<OptimizedParams, Error> {
*self.call.borrow_mut() = Some(OptimizeCall {
init: init.clone(),
data: data.clone(),
_opts: opts.clone(),
});
Ok(OptimizedParams {
k: init.k,
m: init.m,
sigma_obs: init.sigma_obs,
delta: init.delta.clone(),
beta: init.beta.clone(),
trend: Vec::new(),
})
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance MockOptimizer for more realistic testing scenarios

The current MockOptimizer implementation is a good start, but consider the following improvements to make it more versatile for testing:

  1. Add an option to simulate optimization failure by returning an Error in certain conditions.
  2. Implement a way to set predefined OptimizedParams to be returned, allowing for more controlled test scenarios.
  3. Consider adding a delay option to simulate long-running optimizations.

Example enhancement:

impl MockOptimizer {
    pub fn with_result(result: Result<OptimizedParams, Error>) -> Self {
        Self {
            call: RefCell::new(None),
            result: RefCell::new(result),
        }
    }
    
    // ... other methods ...
}

impl Optimizer for MockOptimizer {
    fn optimize(&self, /* ... */) -> Result<OptimizedParams, Error> {
        // ... existing code ...
        self.result.borrow().clone()
    }
}

These enhancements will allow for more comprehensive testing of code that uses the Optimizer trait.

Comment on lines +117 to +194
pub fn predict(&self, data: impl Into<Option<PredictionData>>) -> Result<Predictions, Error> {
let Self {
processed: Some(processed),
optimized: Some(params),
changepoints_t: Some(changepoints_t),
scales: Some(scales),
..
} = self
else {
return Err(Error::ModelNotFit);
};
let data = data.into();
let df = data
.map(|data| {
let training_data = TrainingData {
n: data.n,
ds: data.ds.clone(),
y: vec![],
cap: data.cap.clone(),
floor: data.floor.clone(),
seasonality_conditions: data.seasonality_conditions.clone(),
x: data.x.clone(),
};
self.setup_dataframe(training_data, Some(scales.clone()))
.map(|(df, _)| df)
})
.transpose()?
.unwrap_or_else(|| processed.history.clone());

let mut trend = self.predict_trend(
&df.t,
&df.cap,
&df.floor,
changepoints_t,
params,
scales.y_scale,
)?;
let features = self.make_all_features(&df)?;
let seasonal_components = self.predict_features(&features, params, scales.y_scale)?;

let yhat_point = izip!(
&trend.point,
&seasonal_components.additive.point,
&seasonal_components.multiplicative.point
)
.map(|(t, a, m)| t * (1.0 + m) + a)
.collect();
let mut yhat = FeaturePrediction {
point: yhat_point,
lower: None,
upper: None,
};

if self.opts.uncertainty_samples > 0 {
self.predict_uncertainty(
&df,
&features,
params,
changepoints_t,
&mut yhat,
&mut trend,
scales.y_scale,
)?;
}

Ok(Predictions {
ds: df.ds,
yhat,
trend,
cap: df.cap,
floor: scales.logistic_floor.then_some(df.floor),
additive: seasonal_components.additive,
multiplicative: seasonal_components.multiplicative,
holidays: seasonal_components.holidays,
seasonalities: seasonal_components.seasonalities,
regressors: seasonal_components.regressors,
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider refactoring the predict method for improved maintainability and performance.

The predict method is well-structured but quite complex. Consider the following improvements:

  1. Error handling: Implement more specific error types for different failure scenarios, rather than using a generic ModelNotFit error.
  2. Performance: Evaluate if creating a new TrainingData struct for each prediction can be optimized or pre-computed.
  3. Readability: Break down the method into smaller, well-named helper functions to improve clarity and maintainability.

These changes could enhance the method's robustness, performance, and readability.

Would you like assistance in implementing these suggestions?

Comment on lines +207 to +229
pub fn make_future_dataframe(
&self,
horizon: NonZeroU32,
include_history: IncludeHistory,
) -> Result<PredictionData, Error> {
let Some(Preprocessed { history_dates, .. }) = &self.processed else {
return Err(Error::ModelNotFit);
};
let freq = Self::infer_freq(history_dates)?;
let last_date = *history_dates.last().ok_or(Error::NotEnoughData)?;
let n = (horizon.get() as u64 + 1) as TimestampSeconds;
let dates = (last_date..last_date + n * freq)
.step_by(freq as usize)
.filter(|ds| *ds > last_date)
.take(horizon.get() as usize);

let ds = if include_history == IncludeHistory::Yes {
history_dates.iter().copied().chain(dates).collect()
} else {
dates.collect()
};
Ok(PredictionData::new(ds))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance input validation and error handling in make_future_dataframe.

The make_future_dataframe method is well-implemented, including the frequency inference. Consider the following improvements:

  1. Input validation: Add checks to ensure that horizon is reasonable (e.g., not too large) to prevent potential issues with very long forecasts.
  2. Error handling: The infer_freq method could potentially return an error. Consider how to handle this case explicitly.

These enhancements would improve the robustness of the method.

Would you like assistance in implementing these improvements?

Comment on lines +200 to +283
pub(super) fn preprocess(&mut self, mut data: TrainingData) -> Result<Preprocessed, Error> {
let n = data.ds.len();
if n != data.y.len() {
return Err(Error::MismatchedLengths {
a: n,
a_name: "ds".to_string(),
b: data.y.len(),
b_name: "y".to_string(),
});
}
if n < 2 {
return Err(Error::NotEnoughData);
}
(data.ds, data.y) = data
.ds
.into_iter()
.zip(data.y)
.filter(|(_, y)| !y.is_nan())
.unzip();

let mut history_dates = data.ds.clone();
history_dates.sort_unstable();

let (history, scales) = self.setup_dataframe(data, None)?;
self.scales = Some(scales);
self.set_auto_seasonalities(&history)?;
let Features {
features,
prior_scales,
modes,
component_columns,
holiday_names,
} = self.make_all_features(&history)?;
self.component_modes = Some(modes);
self.train_holiday_names = Some(holiday_names);
self.train_component_columns = Some(component_columns.clone());

let (changepoints, changepoints_t) = self.get_changepoints(&history.ds)?;
self.changepoints = Some(changepoints);
self.changepoints_t = Some(changepoints_t.clone());

let cap = if self.opts.growth == GrowthType::Logistic {
history.cap_scaled.clone().ok_or(Error::MissingCap)?
} else {
vec![0.0; n]
};

// Transpose X; we store it column-major but Stan expects it a contiguous
// array in row-major format.
// format.
#[allow(non_snake_case)]
let mut X = vec![0.0; features.data.len() * features.data[0].len()];
for (i, row) in features.data.iter().enumerate() {
for (j, val) in row.iter().enumerate() {
X[i + features.data.len() * j] = *val;
}
}

let data = Data {
T: history
.ds
.len()
.try_into()
.map_err(|_| Error::TooManyDataPoints(n))?,
S: changepoints_t.len() as i32,
K: features.names.len() as i32,
tau: self.opts.changepoint_prior_scale,
trend_indicator: self.opts.growth.into(),
y: history.y_scaled.clone(),
t: history.t.clone(),
t_change: changepoints_t,
X,
sigmas: prior_scales,
s_a: component_columns.additive,
s_m: component_columns.multiplicative,
cap,
};

Ok(Preprocessed {
data,
history,
history_dates,
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider refactoring the preprocess method for improved readability

The preprocess method is comprehensive but quite long. To improve readability and maintainability, consider breaking it down into smaller, more focused helper methods.

Here's a suggested refactoring approach:

  1. Create a helper method for data validation.
  2. Create a helper method for setting up the dataframe and scales.
  3. Create a helper method for setting up seasonalities and features.
  4. Create a helper method for setting up changepoints.

This could look something like:

pub(super) fn preprocess(&mut self, mut data: TrainingData) -> Result<Preprocessed, Error> {
    self.validate_data(&data)?;
    let (history, scales) = self.setup_dataframe_and_scales(data)?;
    self.setup_seasonalities_and_features(&history)?;
    let (changepoints, changepoints_t) = self.setup_changepoints(&history)?;
    
    // ... rest of the method ...
}

fn validate_data(&self, data: &TrainingData) -> Result<(), Error> {
    // ... validation logic ...
}

fn setup_dataframe_and_scales(&mut self, data: TrainingData) -> Result<(ProcessedData, Scales), Error> {
    // ... setup logic ...
}

fn setup_seasonalities_and_features(&mut self, history: &ProcessedData) -> Result<Features, Error> {
    // ... seasonality and feature setup logic ...
}

fn setup_changepoints(&self, history: &ProcessedData) -> Result<(Vec<TimestampSeconds>, Vec<f64>), Error> {
    // ... changepoint setup logic ...
}

This refactoring would make the preprocess method more modular and easier to understand and maintain.

Comment on lines +285 to +391
mut seasonality_conditions,
mut x,
floor,
cap,
}: TrainingData,
scales: Option<Scales>,
) -> Result<(ProcessedData, Scales), Error> {
if y.iter().any(|y| y.is_infinite()) {
return Err(Error::InfiniteValue {
column: "y".to_string(),
});
}
for name in self.regressors.keys() {
if !x.contains_key(name) {
return Err(Error::MissingRegressor(name.clone()));
}
// No need to check lengths or inf, we do that in [`TrainingData::with_regressors`].
}
for Seasonality { condition_name, .. } in self.seasonalities.values() {
if let Some(condition_name) = condition_name {
if !seasonality_conditions.contains_key(condition_name) {
return Err(Error::MissingSeasonalityCondition(condition_name.clone()));
}
// No need to check lengths or inf, we do that in [`TrainingData::with_regressors`].
}
}

// Sort everything by date.
let mut sort_indices = (0..n).collect_vec();
sort_indices.sort_by_key(|i| ds[*i]);
ds.sort();
// y isn't provided for predictions.
if !y.is_empty() {
y = sort_indices.iter().map(|i| y[*i]).collect();
}
for condition in seasonality_conditions.values_mut() {
*condition = sort_indices.iter().map(|i| condition[*i]).collect();
}
for regressor in x.values_mut() {
*regressor = sort_indices.iter().map(|i| regressor[*i]).collect();
}

let scales = scales
.map(Ok)
.unwrap_or_else(|| self.initialize_scales(&ds, &y, &x, &floor, &cap))?;

let floor = if scales.logistic_floor {
floor.ok_or(Error::MissingFloor)?
} else if self.opts.scaling == Scaling::AbsMax {
vec![0.0; ds.len()]
} else {
vec![scales.y_min; ds.len()]
};
let cap_scaled = if self.opts.growth == GrowthType::Logistic {
let cap = cap.as_ref().ok_or(Error::MissingCap)?;
let mut cap_scaled = Vec::with_capacity(ds.len());
for (cap, floor) in cap.iter().zip(&floor) {
let cs = (cap - floor) / scales.y_scale;
if cs <= 0.0 {
return Err(Error::CapNotGreaterThanFloor);
}
cap_scaled.push(cs);
}
Some(cap_scaled)
} else {
None
};

let t = ds
.iter()
.map(|ds| (ds - scales.start) as f64 / scales.t_scale)
.collect();

let y_scaled = y
.iter()
.zip(&floor)
.map(|(y, floor)| (y - floor) / scales.y_scale)
.collect();

for (name, regressor) in &scales.regressors {
let col = x
.get_mut(name)
.ok_or_else(|| Error::MissingRegressor(name.clone()))?;
for x in col {
*x = (*x - regressor.mu) / regressor.std;
}
}

let data = ProcessedData {
ds,
t,
y_scaled,
cap,
cap_scaled,
floor,
regressors: x,
seasonality_conditions,
};
Ok((data, scales))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Refactor setup_dataframe for improved readability and error handling

The setup_dataframe method is quite long and complex. Consider breaking it down into smaller, more focused helper methods to improve readability and maintainability.

Here's a suggested refactoring approach:

  1. Create a helper method for validating input data.
  2. Create a helper method for sorting data.
  3. Create a helper method for scaling data.
  4. Create a helper method for handling regressors.

This could look something like:

pub(super) fn setup_dataframe(&self, data: TrainingData, scales: Option<Scales>) -> Result<(ProcessedData, Scales), Error> {
    self.validate_input_data(&data)?;
    let sorted_data = self.sort_data(data);
    let scales = scales.unwrap_or_else(|| self.initialize_scales(&sorted_data)?);
    let scaled_data = self.scale_data(sorted_data, &scales)?;
    let processed_data = self.handle_regressors(scaled_data, &scales)?;
    Ok((processed_data, scales))
}

fn validate_input_data(&self, data: &TrainingData) -> Result<(), Error> {
    // ... validation logic ...
}

fn sort_data(&self, mut data: TrainingData) -> TrainingData {
    // ... sorting logic ...
}

fn scale_data(&self, data: TrainingData, scales: &Scales) -> Result<ProcessedData, Error> {
    // ... scaling logic ...
}

fn handle_regressors(&self, mut data: ProcessedData, scales: &Scales) -> Result<ProcessedData, Error> {
    // ... regressor handling logic ...
}

This refactoring would make the setup_dataframe method more modular and easier to understand and maintain.

Additionally, there's an issue with error handling for infinite values:

The current check for infinite values only considers the y values:

if y.iter().any(|y| y.is_infinite()) {
    return Err(Error::InfiniteValue {
        column: "y".to_string(),
    });
}

Consider extending this check to include NaN values and also check other relevant columns like regressors:

fn check_infinite_or_nan<T: Float>(values: &[T], column: &str) -> Result<(), Error> {
    if values.iter().any(|v| v.is_infinite() || v.is_nan()) {
        return Err(Error::InfiniteOrNaNValue {
            column: column.to_string(),
        });
    }
    Ok(())
}

// In setup_dataframe:
check_infinite_or_nan(&y, "y")?;
for (name, regressor) in &x {
    check_infinite_or_nan(regressor, &format!("regressor `{}`", name))?;
}

This will provide more comprehensive error checking and prevent potential issues with infinite or NaN values in the data.

Comment on lines +393 to +474
fn initialize_scales(
&self,
ds: &[TimestampSeconds],
y: &[f64],
regressors: &HashMap<String, Vec<f64>>,
floor: &Option<Vec<f64>>,
cap: &Option<Vec<f64>>,
) -> Result<Scales, Error> {
let mut scales = Scales::default();
match floor.as_ref() {
Some(floor) if self.opts.growth == GrowthType::Logistic => {
scales.logistic_floor = true;
match (self.opts.scaling, cap) {
(Scaling::AbsMax, _) => {
let MinMaxResult::MinMax(y_min, y_scale) = y
.iter()
.zip(floor)
.map(|(y, floor)| (y - floor).abs())
.minmax_by(|a, b| {
a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)
})
else {
return Err(Error::AbsMaxScalingFailed);
};
scales.y_min = y_min;
scales.y_scale = y_scale;
}
(Scaling::MinMax, Some(cap)) => {
scales.y_min = floor.iter().copied().nanmin(true);
scales.y_scale = cap.iter().copied().nanmax(true) - scales.y_min;
}
_ => {
return Err(Error::MissingCap);
}
}
}
Some(_) | None => match self.opts.scaling {
Scaling::AbsMax => {
scales.y_min = 0.0;
scales.y_scale = y.iter().map(|y| y.abs()).nanmax(true);
}
Scaling::MinMax => {
scales.y_min = y.iter().copied().nanmin(true);
scales.y_scale = y.iter().copied().nanmax(true) - scales.y_min;
}
},
};
if scales.y_scale == 0.0 {
scales.y_scale = 1.0;
}

scales.start = *ds.first().ok_or(Error::NotEnoughData)?;
scales.t_scale = (*ds.last().ok_or(Error::NotEnoughData)? - scales.start) as f64;

for (name, regressor) in self.regressors.iter() {
// Standardize if requested.
let col = regressors
.get(name)
.ok_or(Error::MissingRegressor(name.clone()))?;
// If there are 2 or fewer unique values, don't standardize.
let mut vals = col.to_vec();
vals.sort_unstable_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));
vals.dedup();
let mut regressor_scale = RegressorScale::default();
if vals.len() >= 2 {
if regressor.standardize == Standardize::Auto {
regressor_scale.standardize =
!(vals.len() == 2 && vals[0] == 0.0 && vals[1] == 1.0);
}
if regressor_scale.standardize {
let mean = vals.iter().sum::<f64>() / vals.len() as f64;
let variance =
vals.iter().map(|x| (x - mean).powi(2)).sum::<f64>() / vals.len() as f64;
let std = variance.sqrt();
regressor_scale.mu = mean;
regressor_scale.std = std;
}
}
scales.regressors.insert(name.clone(), regressor_scale);
}
Ok(scales)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Refactor initialize_scales for improved readability and correct standard deviation calculation

The initialize_scales method is complex and could benefit from refactoring to improve readability. Additionally, there's an issue with the standard deviation calculation for regressors.

Consider breaking down the method into smaller, more focused helper methods. For example:

  1. Create a helper method for initializing y-scale parameters.
  2. Create a helper method for initializing t-scale parameters.
  3. Create a helper method for handling regressor scaling.

This could look like:

fn initialize_scales(&self, ds: &[TimestampSeconds], y: &[f64], regressors: &HashMap<String, Vec<f64>>, floor: &Option<Vec<f64>>, cap: &Option<Vec<f64>>) -> Result<Scales, Error> {
    let mut scales = Scales::default();
    self.initialize_y_scale(&mut scales, y, floor, cap)?;
    self.initialize_t_scale(&mut scales, ds)?;
    self.initialize_regressor_scales(&mut scales, regressors)?;
    Ok(scales)
}

fn initialize_y_scale(&self, scales: &mut Scales, y: &[f64], floor: &Option<Vec<f64>>, cap: &Option<Vec<f64>>) -> Result<(), Error> {
    // ... y-scale initialization logic ...
}

fn initialize_t_scale(&self, scales: &mut Scales, ds: &[TimestampSeconds]) -> Result<(), Error> {
    // ... t-scale initialization logic ...
}

fn initialize_regressor_scales(&self, scales: &mut Scales, regressors: &HashMap<String, Vec<f64>>) -> Result<(), Error> {
    // ... regressor scaling logic ...
}

There's an issue with the standard deviation calculation for regressors:

let mean = vals.iter().sum::<f64>() / vals.len() as f64;
let variance = vals.iter().map(|x| (x - mean).powi(2)).sum::<f64>() / vals.len() as f64;
let std = variance.sqrt();

This calculation uses the population standard deviation formula (dividing by n) instead of the sample standard deviation formula (dividing by n-1). For most statistical applications, the sample standard deviation is more appropriate. Consider updating the calculation to:

let n = vals.len();
let mean = vals.iter().sum::<f64>() / n as f64;
let variance = vals.iter().map(|x| (x - mean).powi(2)).sum::<f64>() / (n - 1) as f64;
let std = variance.sqrt();

This change will provide a more accurate estimate of the standard deviation, especially for smaller sample sizes.

Comment on lines +887 to +941
/// Get the changepoints for the model.
///
/// Returns a tuple of (changepoints, changepoints_t) where
/// `changepoints` is a vector of dates and `changepoints_t` is a
/// vector of times.
fn get_changepoints(
&self,
ds: &[TimestampSeconds],
) -> Result<(Vec<TimestampSeconds>, Vec<f64>), Error> {
let first_date = ds.first().ok_or(Error::NotEnoughData)?;
let last_date = ds.last().ok_or(Error::NotEnoughData)?;
let changepoints = if let Some(changepoints) = self.opts.changepoints.as_ref() {
if !changepoints.is_empty() {
let too_low = changepoints.iter().any(|x| x < first_date);
let too_high = changepoints.iter().any(|x| x > last_date);
if too_low || too_high {
return Err(Error::ChangepointsOutOfRange);
}
}
changepoints.clone()
} else {
let hist_size = (ds.len() as f64 * *self.opts.changepoint_range).floor() as usize;
let mut n_changepoints = self.opts.n_changepoints as usize;
if n_changepoints + 1 > hist_size {
n_changepoints = hist_size - 1;
}
if n_changepoints > 0 {
// Place changepoints evenly through the first `changepoint_range` percent of the history.
// let step = ((hist_size - 1) as f64 / (n_changepoints + 1) as f64).round();
// let cp_indices = (0..(hist_size - 1)).step_by(step as usize);
let cp_indices = (0..(n_changepoints + 1)).map(|i| {
let num_steps = n_changepoints as f64; // note: don't add one since we don't include
// the last point.
(i as f64 / num_steps * (hist_size - 1) as f64).round() as usize
});
cp_indices.map(|i| ds[i]).skip(1).collect()
} else {
vec![]
}
};

let scales = self
.scales
.as_ref()
.expect("Scales not initialized when setting changepoints; this is a bug");
let changepoints_t = if changepoints.is_empty() {
vec![0.0] // Dummy changepoint.
} else {
changepoints
.iter()
.map(|ds| (ds - scales.start) as f64 / scales.t_scale)
.collect()
};
Ok((changepoints, changepoints_t))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider improving changepoint distribution logic

The get_changepoints method generally seems well-implemented, but there's a potential issue in the logic for automatically generating changepoints:

let cp_indices = (0..(n_changepoints + 1)).map(|i| {
    let num_steps = n_changepoints as f64;
    (i as f64 / num_steps * (hist_size - 1) as f64).round() as usize
});
cp_indices.map(|i| ds[i]).skip(1).collect()

This calculation might not distribute the changepoints evenly through the specified range of the history. The current implementation:

  1. Includes the start of the range (index 0) in the calculation.
  2. Skips the first calculated index.
  3. Doesn't guarantee that the last changepoint is at the end of the specified range.

Consider updating the logic to ensure even distribution:

let cp_indices = (1..=n_changepoints).map(|i| {
    let fraction = i as f64 / (n_changepoints + 1) as f64;
    (fraction * hist_size as f64).round() as usize - 1
});
cp_indices.map(|i| ds[i]).collect()

This change ensures that:

  1. Changepoints are evenly distributed between the start and end of the specified range.
  2. The first changepoint is not at the very beginning of the range.
  3. The last changepoint is close to, but not exactly at, the end of the range.

This adjustment aligns better with the intention of placing changepoints evenly through the first changepoint_range percent of the history.

Comment on lines 1070 to 1486
);
}

#[test]
fn test_component_columns() {
let components = [
(0, ComponentName::Seasonality("weekly".to_string())),
(1, ComponentName::Seasonality("weekly".to_string())),
(2, ComponentName::Seasonality("weekly".to_string())),
(3, ComponentName::Seasonality("weekly".to_string())),
(4, ComponentName::Seasonality("weekly".to_string())),
(5, ComponentName::Seasonality("weekly".to_string())),
(6, ComponentName::Holiday("birthday".to_string())),
(7, ComponentName::Holiday("birthday".to_string())),
(8, ComponentName::Regressor("binary_feature".to_string())),
(9, ComponentName::Regressor("numeric_feature".to_string())),
(10, ComponentName::Regressor("numeric_feature2".to_string())),
(11, ComponentName::Regressor("binary_feature2".to_string())),
(6, ComponentName::Holidays),
(7, ComponentName::Holidays),
(0, ComponentName::AdditiveTerms),
(1, ComponentName::AdditiveTerms),
(2, ComponentName::AdditiveTerms),
(3, ComponentName::AdditiveTerms),
(4, ComponentName::AdditiveTerms),
(5, ComponentName::AdditiveTerms),
(8, ComponentName::AdditiveTerms),
(9, ComponentName::AdditiveTerms),
(11, ComponentName::AdditiveTerms),
(8, ComponentName::RegressorsAdditive),
(9, ComponentName::RegressorsAdditive),
(11, ComponentName::RegressorsAdditive),
(6, ComponentName::MultiplicativeTerms),
(7, ComponentName::MultiplicativeTerms),
(10, ComponentName::MultiplicativeTerms),
(10, ComponentName::RegressorsMultiplicative),
];
let cols = ComponentColumns::new(&components);
assert_eq!(cols.additive, vec![1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1]);
assert_eq!(
cols.multiplicative,
vec![0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0]
);
assert_eq!(cols.all_holidays, vec![0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0]);
assert_eq!(
cols.regressors_additive,
vec![0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1]
);
assert_eq!(
cols.regressors_multiplicative,
vec![0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
);
assert_eq!(cols.seasonalities.len(), 1);
assert_eq!(cols.holidays.len(), 1);
assert_eq!(cols.regressors.len(), 4);
assert_eq!(
cols.seasonalities["weekly"],
&[1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
);
assert_eq!(
cols.holidays["birthday"],
&[0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0]
);
assert_eq!(
cols.regressors["binary_feature"],
&[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
);
assert_eq!(
cols.regressors["numeric_feature"],
&[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
);
assert_eq!(
cols.regressors["numeric_feature2"],
&[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
);
assert_eq!(
cols.regressors["binary_feature2"],
&[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
);
}

#[test]
fn constant_regressor() {
let data = daily_univariate_ts();
let n = data.len();
let data = data
.with_regressors(
HashMap::from([("constant_feature".to_string(), vec![0.0; n])])
.into_iter()
.collect(),
)
.unwrap();
let mut prophet = Prophet::new(ProphetOptions::default(), MockOptimizer::new());
prophet.add_regressor("constant_feature".to_string(), Regressor::additive());
prophet.fit(data, Default::default()).unwrap();
let reg_scales = &prophet.scales.unwrap().regressors["constant_feature"];
assert_approx_eq!(reg_scales.mu, 0.0);
assert_approx_eq!(reg_scales.std, 1.0);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance test coverage and organization

The test module provides good coverage for various functionalities of the Prophet implementation. However, there are some areas where the tests could be improved:

  1. Test organization: Consider grouping related tests into nested modules for better organization. For example:

    mod tests {
        mod dataframe_setup {
            // Tests related to dataframe setup
        }
    
        mod growth_models {
            // Tests for linear, logistic, and flat growth
        }
    
        mod seasonality {
            // Tests for seasonality-related functions
        }
    
        // ... other groupings ...
    }
  2. Error cases: Add more tests for error cases to ensure robust error handling. For example, test the behavior when invalid data is provided.

  3. Edge cases: Include tests for edge cases, such as empty input data, extreme values, or boundary conditions.

  4. Parameterized tests: Use parameterized tests for functions that should behave similarly with different inputs. This can reduce code duplication in tests.

  5. Mocking: Consider using mocks for external dependencies to isolate the units under test.

  6. Benchmarks: Add benchmarks for performance-critical functions to track performance over time.

  7. Documentation tests: Add documentation tests for public APIs to ensure examples in the documentation remain valid.

Example of a parameterized test:

use rstest::rstest;

#[rstest]
#[case(GrowthType::Linear, 1.0, 0.0)]
#[case(GrowthType::Flat, 0.0, 0.5)]
#[case(GrowthType::Logistic, 0.5, 0.5)]
fn test_growth_init(#[case] growth_type: GrowthType, #[case] expected_k: f64, #[case] expected_m: f64) {
    let prophet = Prophet::new(ProphetOptions { growth: growth_type, ..Default::default() }, MockOptimizer::new());
    let (k, m) = prophet.some_growth_init_function();
    assert_approx_eq!(k, expected_k);
    assert_approx_eq!(m, expected_m);
}

Implementing these suggestions would improve the overall quality and maintainability of the test suite.

@sd2k sd2k enabled auto-merge (squash) October 10, 2024 14:26
@sd2k sd2k merged commit d9f79a7 into main Oct 10, 2024
19 checks passed
@sd2k sd2k deleted the prophet branch October 10, 2024 14:30
@sd2k sd2k restored the prophet branch October 10, 2024 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants