-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Prophet algorithm in augurs-prophet
crate
#118
Conversation
Caution Review failedThe pull request is closed. WalkthroughThe recent changes include updates to the Changes
Possibly related PRs
Suggested labels
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
augurs-prophet
crate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (3)
crates/augurs-prophet/Cargo.toml (1)
11-21
: LGTM: Dependencies are well-chosen and configured.The dependencies are appropriate for a forecasting library and are correctly configured:
- Regular dependencies (itertools, num-traits, rand, statrs, thiserror) are suitable for the project's needs.
- Dev-dependencies (augurs-testing, chrono, pretty_assertions) are well-chosen for testing purposes.
- Using workspace-level version specifications for all dependencies ensures consistency across the project.
Consider adding a comment explaining the purpose of each dependency, especially for less common ones like
statrs
orthiserror
. This can help other developers understand why each dependency is necessary.Cargo.toml (2)
38-45
: LGTM: Addition of new dependencies for Prophet implementationThe new dependencies (chrono, num-traits, rand, statrs) and the update to itertools are appropriate for implementing the Prophet algorithm and performing time series analysis. The specified versions are recent, which is good for security and feature availability.
Consider using version ranges (e.g.,
^0.4.38
instead of0.4.38
) for non-workspace dependencies to allow for compatible updates without changing the Cargo.toml file. This can help keep dependencies up-to-date with bug fixes and minor improvements.
54-54
: LGTM: Addition of pretty_assertions for improved test outputThe addition of the
pretty_assertions
crate is a good choice for enhancing the readability of test failure messages. This will improve the developer experience when working with tests.Consider moving
pretty_assertions
to a[dev-dependencies]
section in the Cargo.toml file. This would more clearly separate it from runtime dependencies and potentially improve build times for release builds.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
- Cargo.toml (2 hunks)
- crates/augurs-prophet/Cargo.toml (1 hunks)
- crates/augurs-prophet/src/lib.rs (1 hunks)
- crates/augurs-prophet/src/prophet/predict.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/augurs-prophet/src/lib.rs
🧰 Additional context used
🔇 Additional comments (12)
crates/augurs-prophet/Cargo.toml (1)
1-10
: LGTM: Package metadata is well-configured.The package metadata is correctly set up:
- The package name "augurs-prophet" is appropriate and follows naming conventions.
- Using workspace-level configurations for common fields (license, authors, documentation, repository, version, edition, and keywords) is a good practice for maintaining consistency across multiple crates in a workspace.
Cargo.toml (1)
34-34
: LGTM: Addition of augurs-prophet dependencyThe addition of the
augurs-prophet
dependency is consistent with the PR objective of implementing the Prophet algorithm. The version (0.3.1) matches the workspace package version, and the local path is correctly specified.crates/augurs-prophet/src/prophet/predict.rs (10)
1-103
: Well-structured and documented codeThe initial segment of the file, including imports and struct definitions, is well-organized and properly documented. The use of
pub(super)
for internal structs shows good encapsulation practices.
122-155
: LGTM: Robust trend prediction implementationThe
predict_trend
method effectively handles different growth types (Linear, Logistic, and Flat) with proper error handling. The implementation is well-structured and easy to understand.
157-198
: Efficient piecewise linear implementationThe
piecewise_linear
method is well-implemented using iterators and vector operations, which is efficient and idiomatic Rust. The code is also well-commented, explaining each step of the algorithm.
200-235
: Well-implemented piecewise logistic functionThe
piecewise_logistic
method correctly implements the piecewise logistic function for trend prediction. The code is well-structured, efficient, and follows the expected algorithm for logistic growth.
238-240
: Concise and correct flat trend implementationThe
flat_trend
method efficiently implements a flat trend usingstd::iter::repeat
. The implementation is straightforward and correct.
333-361
: Efficient feature prediction implementationThe
predict_feature
method efficiently predicts a single feature using matrix multiplication. The use ofizip!
for simultaneous iteration over multiple vectors is a good practice. The method correctly handles both additive and multiplicative features.
364-402
: Robust uncertainty prediction implementationThe
predict_uncertainty
method correctly calculates uncertainty intervals for the forecast. It uses appropriate methods for sampling and percentile calculation. The handling of both yhat and trend uncertainty is well-implemented.
466-504
: Well-implemented model simulationThe
sample_model
method correctly simulates observations from the extrapolated model. It appropriately handles additive and multiplicative components separately and adds noise using a Normal distribution, which is standard practice in time series forecasting.
506-577
: Robust predictive trend samplingThe
sample_predictive_trend
method effectively samples new changepoints and deltas to predict the trend. It uses appropriate probability distributions (Poisson and Laplace) and handles edge cases correctly. The implementation is robust and follows best practices for predictive trend sampling in time series forecasting.
582-602
: Reliable percentile calculationThe
percentile_of_sorted
function, adapted from the Rust compiler's test suite, correctly calculates percentiles of sorted samples. It handles all edge cases properly and is a reliable implementation for this statistical operation.
@coderabbitai resolve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (2)
crates/augurs-prophet/src/error.rs (1)
58-60
: Consider adding more context to theScaling
variant.The
Scaling
variant's error message is quite brief. Consider adding more context or the ability to include a reason for the scaling failure.You could modify the variant to include a reason:
#[error("Scaling failed: {0}")] Scaling(String),crates/augurs-prophet/src/prophet/options.rs (1)
32-82
: LGTM: Well-defined enums with appropriate documentation.The SeasonalityOption, Scaling, and EstimationMode enums are clearly defined and well-documented. The non_exhaustive attribute on EstimationMode is a good choice for future extensibility.
Consider implementing Default for Scaling and EstimationMode for consistency with SeasonalityOption:
impl Default for Scaling { fn default() -> Self { Self::AbsMax } } impl Default for EstimationMode { fn default() -> Self { Self::Mle } }This aligns with the defaults specified in the ProphetOptions struct and improves consistency across the module.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- crates/augurs-prophet/src/error.rs (1 hunks)
- crates/augurs-prophet/src/prophet/options.rs (1 hunks)
- crates/augurs-prophet/src/prophet/predict.rs (1 hunks)
🧰 Additional context used
🔇 Additional comments (25)
crates/augurs-prophet/src/error.rs (17)
1-6
: LGTM: Well-structured error enum declaration.The
Error
enum is well-defined with appropriate visibility, derives, and attributes. The use ofthiserror
and#[non_exhaustive]
demonstrates good practices for error handling in a library crate.
7-18
: LGTM: Well-structuredMismatchedLengths
variant.The
MismatchedLengths
variant provides comprehensive information about mismatched column lengths, including the lengths and names of both columns involved.
19-21
: LGTM: ClearNotEnoughData
variant.The
NotEnoughData
variant has a concise and clear error message.
22-24
: LGTM: InformativeOptimizationFailed
variant.The
OptimizationFailed
variant allows for a custom error message, which is useful for providing specific details about the optimization failure.
25-29
: LGTM: Well-documentedInvalidIntervalWidth
variant.The
InvalidIntervalWidth
variant includes a helpful comment explaining the valid range and provides the invalid value in the error message.
33-35
: LGTM: ClearTimestampsAreConstant
variant.The
TimestampsAreConstant
variant provides the constant timestamp value, which is helpful for debugging.
36-38
: LGTM: InformativeDuplicateSeasonality
variant.The
DuplicateSeasonality
variant includes the name of the duplicate seasonality, which is useful for identifying the issue.
39-44
: LGTM: Well-structuredInfiniteValue
variant.The
InfiniteValue
variant provides the name of the column containing the infinite value, which is helpful for debugging.
45-50
: LGTM: Well-structuredNaNValue
variant.The
NaNValue
variant provides the name of the column containing the NaN value, which is helpful for debugging.
51-53
: LGTM: ClearMissingRegressor
variant.The
MissingRegressor
variant includes the name of the missing regressor, which is useful for identifying the issue.
54-57
: LGTM: InformativeMissingSeasonalityCondition
variant.The
MissingSeasonalityCondition
variant includes the name of the missing condition, which is helpful for debugging.
61-63
: LGTM: ClearMissingCap
variant.The
MissingCap
variant has a concise and clear error message.
64-66
: LGTM: ClearMissingFloor
variant.The
MissingFloor
variant has a concise and clear error message.
67-69
: LGTM: InformativeCapNotGreaterThanFloor
variant.The
CapNotGreaterThanFloor
variant provides a clear explanation of the issue, including the default floor value.
70-73
: LGTM: ClearChangepointsOutOfRange
variant.The
ChangepointsOutOfRange
variant has a concise and clear error message.
74-78
: LGTM: Well-documentedModelNotFit
variant.The
ModelNotFit
variant includes a helpful comment directing users to the appropriate method for fitting the model.
1-85
: Overall, theError
enum is well-designed and comprehensive.The
Error
enum provides a robust error handling mechanism for the Prophet forecasting algorithm. It covers a wide range of potential error scenarios with clear and informative error messages. Most variants are well-structured and include helpful associated data where appropriate.There are a few minor issues to address:
- Fix the syntax in the
TooManyDataPoints
error message.- Consider adding more context to the
Scaling
variant.- Improve the
UnableToInferFrequency
variant to avoid potentially large error messages.Addressing these points will further enhance the quality and usability of this error handling mechanism.
crates/augurs-prophet/src/prophet/options.rs (6)
1-9
: LGTM: Clear module documentation and appropriate imports.The file header provides a concise explanation of the module's purpose, and the imports are relevant to the implementation of Prophet model options.
11-30
: LGTM: Well-defined GrowthType enum with appropriate conversion.The GrowthType enum is clearly defined with three variants, and the conversion to TrendIndicator is implemented correctly.
84-131
: LGTM: Well-implemented IntervalWidth struct with appropriate constraints.The IntervalWidth struct is correctly implemented as a newtype wrapper around f64, with proper validation in the try_new method. The various trait implementations provide a good API for working with IntervalWidth.
This implementation addresses the previous TODO comment about creating a newtype wrapper for interval_width, ensuring that only valid values are used and enhancing the robustness of the code.
265-406
: LGTM: Well-defined ProphetOptions struct with appropriate defaults.The ProphetOptions struct is clearly defined with all necessary options for the Prophet model. The Default implementation provides sensible default values that align with the Python implementation of Prophet. The documentation for each field is comprehensive and helpful.
1-406
: Overall: Well-implemented Prophet options with room for minor improvements.This file provides a comprehensive and well-structured implementation of options for the Prophet model in Rust. The code is generally clear, well-documented, and follows Rust best practices. The suggested improvements, such as implementing Default for some enums and simplifying the From implementation, will further enhance the code's quality and consistency.
Great job on creating a robust foundation for configuring Prophet models in Rust!
133-135
: Consider the implications of removing OptProphetOptions.The TODO comment suggests removing this struct. While OptProphetOptions provides flexibility in specifying Prophet options, its removal could simplify the API. However, this decision depends on how the struct is used throughout the codebase and by external users.
Before making a decision, it would be helpful to analyze the usage of OptProphetOptions:
Consider the impact on API users and whether the flexibility provided by OptProphetOptions is necessary for your use cases.
crates/augurs-prophet/src/prophet/predict.rs (2)
1-118
: Well-structured imports and data definitionsThe imports are appropriate for the functionality, and the data structures (FeaturePrediction, FeaturePredictions, Predictions, IncludeHistory, and PosteriorPredictiveSamples) are well-defined and documented. This provides a solid foundation for the Prophet forecasting model implementation.
1-821
: Summary of Prophet predict.rs reviewOverall, the implementation of the Prophet forecasting model in this file is well-structured and comprehensive. Key strengths include:
- Clear and well-documented data structures
- Comprehensive implementation of various prediction methods
- Robust test coverage
Areas for improvement include:
- Addressing TODO comments, particularly in
piecewise_linear
andsample_posterior_predictive
- Optimizing matrix multiplication in
predict_feature
- Enhancing test coverage with property-based tests and edge cases
Addressing these points will further improve the reliability and performance of the Prophet model implementation.
✅ Actions performedComments resolved. |
@coderabbitai full review |
✅ Actions performedFull review triggered. |
1 similar comment
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 23
🧹 Outside diff range and nitpick comments (36)
.gitignore (1)
3-3
: LGTM! Consider sorting entries for consistency.The addition of
.bacon-locations
to the.gitignore
file is appropriate and helps keep the repository clean of unnecessary files. This is consistent with good version control practices.For improved readability and consistency, consider sorting the entries alphabetically. Here's a suggested order:
.bacon-locations .vscode /Cargo.lock /target
This minor change would make it easier to scan the file for specific entries in the future.
crates/augurs-outlier/Cargo.toml (1)
Line range hint
15-16
: Consider using workspace versions forrayon
andrustc-hash
.While most dependencies have been updated to use workspace versions,
rayon
andrustc-hash
still use specific versions:rayon = { version = "1.10.0", optional = true } rustc-hash = "2.0.0"For consistency, consider moving these to workspace versions as well, if appropriate for your project structure.
If there's a specific reason these need to remain as they are, it might be helpful to add a comment explaining why.
Also applies to: 20-21
crates/augurs-ets/Cargo.toml (1)
Consistency in Dependency Configurations
The verification confirms that some dependencies (e.g.,
lstsq
,nalgebra
,rand_distr
) still specify exact versions instead of using workspace configurations. To maintain consistency and simplify dependency management across the workspace, consider migrating these dependencies to use workspace-level configurations where feasible.🔗 Analysis chain
Line range hint
1-45
: Consider using workspace configurations for all dependenciesWhile most dependencies now use workspace-level configurations, a few (lstsq, nalgebra, rand_distr) still specify exact versions. To maintain consistency and simplify future updates, consider moving these remaining dependencies to workspace configurations as well, if possible.
To verify the current state of dependency management across the workspace, you can run the following script:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for inconsistencies in dependency versions across the workspace # Test: List all Cargo.toml files and their dependency declarations fd Cargo.toml | xargs -I {} sh -c 'echo "File: {}"; sed -n "/^\[dependencies\]/,/^\[/p" {} | grep -v "^\[dependencies\]" | grep -v "^\[" | sort' # Note: Review the output to identify any inconsistencies in version specifications or usage of workspace configurationsLength of output: 4705
crates/augurs/src/lib.rs (1)
31-32
: LGTM! Consider adding a brief doc comment.The new import for
augurs_prophet
is well-placed and follows the existing pattern in the file. It's correctly feature-gated and re-exported, which is consistent with other modules.Consider adding a brief doc comment above this import, similar to the other modules, to provide a quick description of what the prophet module does. For example:
/// Time series forecasting using the Prophet algorithm #[cfg(feature = "prophet")] pub use augurs_prophet as prophet;crates/augurs/Cargo.toml (1)
37-37
: LGTM: New feature correctly added, consider including infull
featureThe new
prophet
feature is correctly defined and follows the project's naming conventions. It properly enables theaugurs-prophet
dependency when activated.Consider adding the
prophet
feature to thefull
feature list on line 33 if it should be included when all features are enabled. If this exclusion is intentional, you may want to add a comment explaining why.Cargo.toml (4)
38-38
: LGTM: Addition of chrono dependencyThe addition of the
chrono
dependency is appropriate for handling time-related operations, which are crucial for time series forecasting in the Prophet model.Consider using a caret (
^
) version requirement (e.g.,^0.4.38
) to allow for compatible updates within the 0.4.x range. This can help keep the dependency up-to-date with bug fixes and minor improvements.
41-42
: LGTM: Addition of num-traits, rand, and statrs dependenciesThe addition of
num-traits
,rand
, andstatrs
dependencies is appropriate for implementing the Prophet forecasting model:
num-traits
provides useful traits for generic numeric operations.rand
is necessary for sampling operations in the model.statrs
offers statistical computing capabilities essential for the implementation.Consider using caret (
^
) version requirements for these dependencies as well (e.g.,^0.2.19
,^0.8.5
,^0.17.1
) to allow for compatible updates within their respective major versions. This approach can help keep the dependencies up-to-date with bug fixes and minor improvements while maintaining compatibility.Also applies to: 45-45
54-54
: LGTM: Addition of pretty_assertions dependencyThe addition of the
pretty_assertions
dependency is a good choice for improving the readability of test failure messages. Its placement in the dev-dependencies section is correct, as it's only needed for testing.Consider using a caret (
^
) version requirement (e.g.,^1.4.1
) to allow for compatible updates within the 1.x.x range. This can help keep the dependency up-to-date with bug fixes and minor improvements while maintaining compatibility.
Line range hint
34-54
: Summary: Dependencies added for Prophet algorithm implementationThe changes to
Cargo.toml
align well with the PR objective of implementing the Prophet forecasting model in theaugurs-prophet
crate. The new dependencies (augurs-prophet
,chrono
,num-traits
,rand
,statrs
, andpretty_assertions
) provide the necessary tools for time series handling, statistical computations, and improved testing.These additions enhance the project's capabilities for time series forecasting and are consistent with the implementation of a Prophet-like model in Rust.
To ensure long-term maintainability, consider implementing a dependency management strategy that includes:
- Regular updates of dependencies to benefit from bug fixes and performance improvements.
- A process for reviewing and testing dependency updates before merging them into the main branch.
- Using lockfiles (Cargo.lock) to ensure reproducible builds across different environments.
crates/augurs-prophet/README.md (4)
10-20
: LGTM: Good overview of optimizers with a minor suggestionThis section effectively introduces the concept of optimizers and their role in the Prophet library. The explanation of the
Optimizer
andSampler
traits provides a clear picture of the crate's architecture.Consider adding a brief explanation of why optimization is crucial for time series forecasting. This could help readers who are not familiar with the concept to understand its importance in the context of Prophet.
22-61
: LGTM: Comprehensive overview of optimization approachesThis section provides an excellent breakdown of potential optimization strategies, demonstrating a thorough consideration of various environments and use cases. The explanations are clear and highlight the trade-offs of each approach.
To improve clarity, consider adding a brief summary table at the end of this section that compares the four approaches based on key factors such as ease of implementation, compatibility with different environments, and potential performance. This could help readers quickly grasp the pros and cons of each method.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~42-~42: Possible missing comma found.
Context: ...pass data directly to it from Rust. In theory this should work OK for any target whic...(AI_HYDRA_LEO_MISSING_COMMA)
69-72
: LGTM: Useful reference links providedThe links section provides valuable resources for readers to learn more about the technologies mentioned in the README.
Consider adding a brief description next to each link to give readers a quick overview of what they can expect to find at each resource. For example:
[Prophet]: https://facebook.github.io/prophet/ "Facebook's time series forecasting tool" [Stan]: https://mc-stan.org/ "Statistical modeling and high-performance statistical computation" [cxx]: https://cxx.rs/ "Safe interop between Rust and C++" [WASM component]: https://component-model.bytecodealliance.org/ "WebAssembly Component Model"
42-42
: Optional: Consider adding a comma for improved readabilityWhile grammatically correct, you might consider adding a comma after "In theory" to slightly improve readability:
-In theory this should work OK for any target which Stan can compile to. +In theory, this should work OK for any target which Stan can compile to.This is a minor stylistic suggestion and is entirely optional.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~42-~42: Possible missing comma found.
Context: ...pass data directly to it from Rust. In theory this should work OK for any target whic...(AI_HYDRA_LEO_MISSING_COMMA)
bacon.toml (4)
1-10
: LGTM! Consider adding a brief description of the project's specific usage.The header comments and default configuration are well-structured and informative. They provide useful links for further information about the bacon tool.
Consider adding a brief comment describing how this configuration is specifically used in your project, which could help new contributors understand its purpose more quickly.
48-68
: LGTM! Consider adding a comment explaining package exclusions in doc-test.The test and doc-test configurations are well-structured and use appropriate tools and options.
For the doc-test job, consider adding a brief comment explaining why
augurs-js
andpyaugurs
packages are excluded. This would help other developers understand the reasoning behind these exclusions.Example:
[jobs.doc-test] command = [ "cargo", "test", "--doc", "--color", "always", "--all-features", "--workspace", # Exclude non-Rust packages from doc tests "--exclude", "augurs-js", "--exclude", "pyaugurs", ]
80-105
: LGTM! Consider periodic checks with warnings disabled.The run and ex job configurations are flexible and well-suited for development purposes.
While allowing warnings is useful during active development, consider adding a separate job or a periodic check where warnings are treated as errors. This can help catch and address warnings before they accumulate.
Example additional job:
[jobs.run-strict] command = [ "cargo", "run", "--color", "always", ] need_stdout = true allow_warnings = false background = trueThis allows for a stricter check when needed, while maintaining the flexibility of the current configuration.
106-113
: LGTM! Consider adding a keybinding for the 'test' job.The keybindings section is well-commented and provides useful shortcuts for common tasks.
For consistency and improved workflow, consider adding a keybinding for the 'test' job. This would complement the existing bindings for clippy and doc-test. For example:
[keybindings] c = "job:clippy-all" d = "job:doc-test" t = "job:test" # Add this lineThis addition would provide quick access to all main verification tasks: linting, doc-tests, and regular tests.
README.md (1)
38-43
: LGTM! New "Developing" section added.The new section provides valuable information for developers about the project's development tools. The use of reference-style links is consistent with the document's style.
However, there's a minor improvement that can be made:
Consider updating the mention of
bacon
to use the reference link:-Some of the tasks require `bacon`, which will also need to be installed separately. +Some of the tasks require [`bacon`], which will also need to be installed separately.This change will make the
bacon
text a clickable link, consistent with howjust
is referenced.crates/augurs-prophet/src/util.rs (1)
3-72
: Enhance Documentation with Usage ExamplesThe methods
nanmin
,nanmax
, andnanmean
have clear documentation comments. Including code examples demonstrating their usage would improve the usability of the API and assist users in understanding how to implement these methods effectively.crates/augurs-prophet/src/features.rs (3)
17-23
: Consider derivingPartialEq
andEq
forHoliday
To enable equality comparisons of
Holiday
instances, consider derivingPartialEq
andEq
. This can be useful for testing and when usingHoliday
in collections that require these traits.Apply this diff:
-#[derive(Debug, Clone)] +#[derive(Debug, Clone, PartialEq, Eq)]
138-143
: Consider derivingPartialEq
andEq
forRegressor
To enable equality comparisons of
Regressor
instances, consider derivingPartialEq
andEq
.Apply this diff:
-#[derive(Debug, Clone, Default)] +#[derive(Debug, Clone, Default, PartialEq, Eq)]
179-186
: Consider derivingPartialEq
andEq
forSeasonality
To allow equality comparisons of
Seasonality
instances, consider derivingPartialEq
andEq
. This can be helpful for testing and managing collections.Apply this diff:
-#[derive(Debug, Clone)] +#[derive(Debug, Clone, PartialEq, Eq)]crates/augurs-prophet/src/optimizer.rs (3)
208-208
: Remove leading underscore from_opts
field inOptimizeCall
The field
_opts
inOptimizeCall
is prefixed with an underscore, which suggests it's unused. Since this field stores optimization options and may be used in future tests or assertions, consider renaming it toopts
for clarity.Apply this diff:
pub(crate) struct OptimizeCall { pub init: InitialParams, pub data: Data, - pub _opts: OptimizeOpts, + pub opts: OptimizeOpts, }
205-215
: Restrict visibility of internal fields for better encapsulationThe
call
field inMockOptimizer
and the fields inOptimizeCall
are publicly accessible. If these fields are intended for internal use within tests, consider making them private to prevent unintended external access and to encapsulate internal state.Apply this diff:
pub(crate) struct OptimizeCall { - pub init: InitialParams, - pub data: Data, - pub opts: OptimizeOpts, + init: InitialParams, + data: Data, + opts: OptimizeOpts, } #[derive(Clone, Debug)] pub(crate) struct MockOptimizer { - pub call: RefCell<Option<OptimizeCall>>, + call: RefCell<Option<OptimizeCall>>, }Provide public getter methods if external access is necessary for testing purposes.
97-127
: Enhance documentation for fields inOptimizeOpts
Some fields in
OptimizeOpts
lack detailed documentation, which may lead to confusion about their purpose or acceptable values. Providing comprehensive descriptions will improve the usability of the API.For example, consider expanding the documentation for the following fields:
jacobian
: Explain when to set this option totrue
and its impact on optimization.history_size
: Specify the default value and how it affects the L-BFGS algorithm.iter
: Clarify the default iteration limit and how it interacts with convergence tolerances.crates/augurs-prophet/src/data.rs (2)
55-61
: Simplify error message construction in length checks.In length validation, you manually create
String
objects fora_name
andb_name
. Since these are static strings, you can use string literals to avoid unnecessary allocations.Update the error construction:
return Err(Error::MismatchedLengths { a_name: "ds", a: self.ds.len(), b_name: "cap", b: cap.len(), });Also applies to: 73-79, 221-227, 239-245
49-65
: Add examples to documentation forwith_cap
methods.While the
with_cap
methods have documentation, adding usage examples can help users understand how to use these methods effectively.For example:
/// Add the cap for logistic growth. /// /// # Errors /// /// Returns an error if the lengths of `ds` and `cap` differ. /// /// # Examples /// /// ```rust /// let training_data = TrainingData::new(ds, y) /// .expect("Failed to create TrainingData") /// .with_cap(cap) /// .expect("Failed to add cap"); /// ```The same can be applied to the
with_floor
methods and others.Also applies to: 215-231
crates/augurs-prophet/src/prophet/options.rs (3)
136-138
: Address the TODO comment regardingOptProphetOptions
The TODO suggests reconsidering the existence of
OptProphetOptions
. If it's deemed unnecessary or if its usage can be simplified, consider refactoring or removing it to streamline the codebase.Would you like assistance in evaluating the necessity of
OptProphetOptions
or proposing a refactor?
393-393
: Handle potential errors without usingunwrap()
in defaultsUsing
unwrap()
on conversions like0.8.try_into().unwrap()
can lead to panics if the conversion fails. Although the values are constants and seem safe, it's considered best practice to handle potential errors explicitly to ensure robustness.Consider using expect with a clear error message or handling the Result:
- changepoint_range: 0.8.try_into().unwrap(), + changepoint_range: PositiveFloat::try_from(0.8).expect("Valid positive float for changepoint_range"),
259-264
: Initializeholidays_mode
with a default valueCurrently,
holidays_mode
is set toNone
, which may lead to ambiguity. Since it's intended to default to the value ofseasonality_mode
, consider initializing it explicitly to enhance clarity and prevent potential issues.Apply this diff to set the default value:
- holidays_mode: None, + holidays_mode: Some(self.seasonality_mode),crates/augurs-prophet/src/prophet.rs (1)
271-274
: Implement MCMC estimation infit
method (as per TODO)There is a TODO comment to implement MCMC estimation by calling
sample
whenself.opts.estimation == EstimationMode::Mcmc
. Implementing this feature would expand the model's capabilities to include MCMC estimation methods.Would you like assistance in implementing this feature or opening a GitHub issue to track this task?
crates/augurs-prophet/src/prophet/predict.rs (1)
810-812
: Clarify Assertion Messages in TestsIn the
predict_absmax
test, the assertion messages can be improved for clarity. The current messages don't accurately reflect the conditions being checked.At lines 810-812:
assert!( lower_bound <= point_estimate, "Lower bound should be less than the point estimate" );Since the assertion is
lower_bound <= point_estimate
, the message should state "Lower bound should be less than or equal to the point estimate."Similarly, at lines 814-816:
assert!( upper_bound >= point_estimate, "Upper bound should be greater than the point estimate" );It should state "Upper bound should be greater than or equal to the point estimate."
Apply the following diff to correct the assertion messages:
assert!( lower_bound <= point_estimate, - "Lower bound should be less than the point estimate" + "Lower bound should be less than or equal to the point estimate" ); // ... assert!( upper_bound >= point_estimate, - "Upper bound should be greater than the point estimate" + "Upper bound should be greater than or equal to the point estimate" );Also applies to: 814-816
crates/augurs-prophet/src/prophet/prep.rs (4)
552-555
: Clarify transposition in Fourier series computationThe comment mentions that the function computes the transpose of the function in the Python code for simplicity. This might be confusing for readers who are not familiar with the Python implementation.
Consider rephrasing the comment for clarity:
-/// Note: this computes the transpose of the function in the Python -/// code for simplicity, since we need it in a columnar format anyway. +/// Note: this implementation computes the Fourier series components in a columnar format, +/// as opposed to the row-wise format used in the original Python implementation, +/// to align with how the data is stored in Rust.
767-768
: Add an explanatory comment for adding a dummy featureWhen no features are present, a dummy feature is added to prevent an empty features matrix. Adding a brief comment explaining the rationale can enhance code readability.
Add the following comment:
// Add a dummy feature to avoid an empty features matrix when no other features are present.
472-477
: Simplifyhandle_seasonality_opt
match armsThe match expression in
handle_seasonality_opt
can be simplified by combining the first two arms, as they return the same value.Apply this diff to simplify the match expression:
match opt { - SeasonalityOption::Auto if self.seasonalities.contains_key(name) => None, - SeasonalityOption::Auto if auto_disable => None, - SeasonalityOption::Auto | SeasonalityOption::Manual(true) => Some(default_order), + SeasonalityOption::Auto if self.seasonalities.contains_key(name) || auto_disable => None, + SeasonalityOption::Auto | SeasonalityOption::Manual(true) => Some(default_order),
Line range hint
1692-1694
: Remove unreachable codeIn the test module, the code at lines 1692-1694 seems unreachable or redundant. Removing such code can clean up the tests and prevent confusion.
Review and remove any unreachable or redundant code in the test module.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
crates/augurs-prophet/data/daily_univariate_ts.csv
is excluded by!**/*.csv
📒 Files selected for processing (26)
- .gitignore (1 hunks)
- Cargo.toml (2 hunks)
- README.md (2 hunks)
- bacon.toml (1 hunks)
- crates/augurs-ets/Cargo.toml (1 hunks)
- crates/augurs-outlier/Cargo.toml (1 hunks)
- crates/augurs-prophet/Cargo.toml (1 hunks)
- crates/augurs-prophet/LICENSE-APACHE (1 hunks)
- crates/augurs-prophet/LICENSE-MIT (1 hunks)
- crates/augurs-prophet/README.md (1 hunks)
- crates/augurs-prophet/src/data.rs (1 hunks)
- crates/augurs-prophet/src/error.rs (1 hunks)
- crates/augurs-prophet/src/features.rs (1 hunks)
- crates/augurs-prophet/src/lib.rs (1 hunks)
- crates/augurs-prophet/src/optimizer.rs (1 hunks)
- crates/augurs-prophet/src/positive_float.rs (1 hunks)
- crates/augurs-prophet/src/prophet.rs (1 hunks)
- crates/augurs-prophet/src/prophet/options.rs (1 hunks)
- crates/augurs-prophet/src/prophet/predict.rs (1 hunks)
- crates/augurs-prophet/src/prophet/prep.rs (1 hunks)
- crates/augurs-prophet/src/testdata.rs (1 hunks)
- crates/augurs-prophet/src/util.rs (1 hunks)
- crates/augurs-seasons/Cargo.toml (1 hunks)
- crates/augurs/Cargo.toml (2 hunks)
- crates/augurs/src/lib.rs (1 hunks)
- justfile (1 hunks)
🧰 Additional context used
📓 Learnings (3)
crates/augurs-prophet/src/data.rs (1)
Learnt from: sd2k PR: grafana/augurs#118 File: crates/augurs-prophet/src/data.rs:114-131 Timestamp: 2024-10-08T16:34:48.833Z Learning: In `crates/augurs-prophet/src/data.rs`, test methods such as `head` and `tail` do not require validation checks on their parameters, as they are used for testing purposes.
crates/augurs-prophet/src/lib.rs (1)
Learnt from: sd2k PR: grafana/augurs#118 File: crates/augurs-prophet/src/lib.rs:14-14 Timestamp: 2024-10-08T16:07:31.990Z Learning: The `optimizer` module in `crates/augurs-prophet/src/optimizer.rs` already has inline module-level documentation.
crates/augurs-prophet/src/testdata.rs (2)
Learnt from: sd2k PR: grafana/augurs#118 File: crates/augurs-prophet/src/testdata.rs:39-44 Timestamp: 2024-10-08T16:09:10.904Z Learning: In Rust test code with hardcoded data, such as in the `load_csv` function in `crates/augurs-prophet/src/testdata.rs`, it's acceptable to use `unwrap()` methods, and handling errors gracefully is less critical.
Learnt from: sd2k PR: grafana/augurs#118 File: crates/augurs-prophet/src/testdata.rs:14-19 Timestamp: 2024-10-08T16:08:23.652Z Learning: When using `NaiveDate::and_hms_opt(0, 0, 0)` with hardcoded values for hour, minute, and second, it's safe to use `unwrap()` because the method will always return `Some`.
🪛 LanguageTool
crates/augurs-prophet/README.md
[uncategorized] ~42-~42: Possible missing comma found.
Context: ...pass data directly to it from Rust. In theory this should work OK for any target whic...(AI_HYDRA_LEO_MISSING_COMMA)
🔇 Additional comments (42)
crates/augurs-prophet/Cargo.toml (3)
1-10
: LGTM: Package metadata is well-configured.The package metadata is correctly set up, using workspace-level configurations for most fields. This approach ensures consistency across the workspace and simplifies maintenance.
18-21
: Dev-dependencies are well-configured.The dev-dependencies are appropriately set up using workspace-level version specifications, maintaining consistency with the main dependencies. The chosen libraries are suitable for testing purposes:
- augurs-testing: likely a custom testing framework for the augurs project
- chrono: useful for handling dates and times in tests
- pretty_assertions: improves readability of test failure messages
11-16
: Dependencies look appropriate, but verify necessity.The dependencies are well-configured using workspace-level version specifications, which is good for consistency. The chosen libraries seem appropriate for a statistical forecasting implementation.
However, it's always a good practice to verify that each dependency is necessary:
- itertools: for advanced iterator operations
- num-traits: for numeric type traits
- rand: for random number generation
- statrs: for statistical distributions and functions
- thiserror: for simplified error handling
Please confirm that all these dependencies are actively used in the implementation. You can use the following script to check for unused dependencies:
✅ Verification successful
All listed dependencies are actively used and appropriately configured.
The verification confirms that
itertools
,num-traits
,rand
,statrs
, andthiserror
are all utilized within the codebase, ensuring their necessity.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for potentially unused dependencies # Test: Search for usage of each dependency echo "Checking for usage of dependencies:" for dep in itertools num-traits rand statrs thiserror; do echo "Searching for $dep usage:" rg --type rust "\b$dep::" crates/augurs-prophet/src echo doneLength of output: 2047
crates/augurs-seasons/Cargo.toml (2)
Line range hint
2-9
: Approve: Good use of workspace declarations for package metadata and dependenciesThe changes to use workspace declarations for package metadata (version, authors, documentation, repository, license, edition, keywords) and dependencies (num-traits) are beneficial for the project. This approach:
- Centralizes version control for dependencies and metadata across the workspace.
- Ensures consistency across different crates in the project.
- Simplifies maintenance by allowing updates to be made in one place.
These changes align well with Rust best practices for workspace management.
Also applies to: 17-17
17-17
: Verify the impact of the num-traits dependency changeThe
num-traits
dependency has been changed from a specific version ("0.2.18"
) to a workspace declaration (workspace = true
). While this is generally a good practice for consistency, it's important to ensure that this change doesn't introduce any compatibility issues with other parts of the project that might depend on a specific version ofnum-traits
.To verify this change, please run the following script:
This script will help ensure that the change to use the workspace declaration for
num-traits
is consistent across the project and doesn't introduce any conflicts.justfile (1)
21-22
: Add a comment explaining thewatch
target and address documentation concerns.Thank you for adding the
watch
target. To improve clarity and maintainability:
- Please add a comment above the
watch
target explaining its purpose and what thebacon
command does.- Ensure that the concerns raised in the previous review comment about documenting the
bacon
tool are addressed, either by updating the project's README or by adding setup instructions for the tool.Example comment:
# Watch for changes and automatically rebuild (requires 'bacon' to be installed) watch: baconcrates/augurs-outlier/Cargo.toml (2)
22-22
: Clarify the configuration for therv
dependency.The
rv
dependency is configured with a specific version and hasdefault-features = false
:rv = { version = "0.17.0", default-features = false }Could you provide some context on why this dependency isn't using a workspace version and why the default features are disabled? This information would be helpful for maintaining consistency and understanding the project's requirements.
To check if this configuration is consistent across the workspace, we can run:
#!/bin/bash # Description: Check for rv dependency configuration across the workspace # Test: Search for rv dependency specifications rg --type toml '^\s*rv\s*=' crates/*/Cargo.tomlThis will help us understand if this configuration is unique to this crate or consistent across the workspace.
Line range hint
1-28
: LGTM! Excellent use of workspace dependencies.The changes to use workspace dependencies for most of the package's dependencies and metadata fields are a great improvement. This approach ensures consistency across the workspace and simplifies dependency management. It's particularly beneficial for larger projects with multiple crates.
To ensure all relevant dependencies are using workspace versions, let's run the following check:
This will help identify any remaining hardcoded versions that might benefit from being moved to workspace dependencies.
crates/augurs-ets/Cargo.toml (2)
Line range hint
1-45
: LGTM: Well-structured Cargo.toml file with consistent use of workspace configurationsThe overall structure of the Cargo.toml file is well-organized and follows Rust package management best practices. The consistent use of workspace-level configurations for most fields and dependencies is a good practice that can improve project-wide consistency and ease maintenance.
19-19
: LGTM: Consistent update of rand dependency to use workspace configurationThe change to use
rand.workspace = true
aligns with the project's approach to centralized dependency management. This update ensures version consistency for therand
crate across the project and simplifies future updates.crates/augurs-prophet/LICENSE-MIT (2)
Line range hint
2-20
: LGTM: MIT License addedThe addition of the MIT License file is appropriate and follows best practices for open-source projects. This license grants users the necessary permissions to use, modify, and distribute the software while providing protection for the authors.
1-1
: Clarify the purpose of the first lineThe first line
../../LICENSE-MIT
seems to reference a license file two directories up from the current location. This is unusual in a license file and may cause confusion.To verify the purpose of this line and ensure it's not causing any issues, please run the following script:
Based on the results, consider removing or commenting out this line if it's not serving a specific purpose.
✅ Verification successful
Verification Successful
The
crates/augurs-prophet/LICENSE-MIT
file does not reference../../LICENSE-MIT
, and no such file exists. The license file is correctly formatted without any confusing references.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check if there's another LICENSE-MIT file two directories up and compare its content with this file. # Test 1: Check if the referenced file exists if [ -f ../../LICENSE-MIT ]; then echo "The file ../../LICENSE-MIT exists." # Test 2: Compare the content of both files if diff ../../LICENSE-MIT crates/augurs-prophet/LICENSE-MIT > /dev/null; then echo "The content of both files is identical." else echo "The content of the files differs." echo "Differences:" diff ../../LICENSE-MIT crates/augurs-prophet/LICENSE-MIT fi else echo "The file ../../LICENSE-MIT does not exist." fi # Test 3: Check if this line is causing any issues in the license file if grep -q "^../../LICENSE-MIT$" crates/augurs-prophet/LICENSE-MIT; then echo "The line '../../LICENSE-MIT' is present at the beginning of the file." echo "This might cause issues with license recognition tools." else echo "The line '../../LICENSE-MIT' is not present at the beginning of the file." fiLength of output: 354
crates/augurs/Cargo.toml (1)
20-20
: LGTM: New dependency correctly addedThe new optional dependency
augurs-prophet
is correctly added and follows the project's conventions. It's marked as optional and uses the workspace version, which is consistent with other dependencies in this file.Cargo.toml (1)
34-34
: LGTM: Addition of augurs-prophet dependencyThe addition of the
augurs-prophet
dependency is consistent with the PR's objective of implementing the Prophet forecasting model. The version (0.3.1) matches the workspace package version, and the path correctly points to a local crate.crates/augurs-prophet/README.md (2)
1-8
: LGTM: Clear introduction and goal statementThe introduction effectively communicates the purpose of the crate and its goal of being low-dependency. This aligns well with the PR objectives and provides valuable information for potential users.
63-67
: LGTM: Appropriate credits and implementation notesThe Credits section properly acknowledges the original Prophet Python package while highlighting the Rust-specific adaptations. This transparency is valuable for users familiar with the Python version and sets appropriate expectations for the Rust implementation.
bacon.toml (4)
69-79
: LGTM! Documentation job configurations are well-structured.The doc and doc-open job configurations are appropriate and well-thought-out:
- Generating documentation without dependencies is a good practice for focused project docs.
- The doc-open job's "back" behavior on success prevents unnecessary browser openings, which is user-friendly.
1-113
: Overall, the bacon.toml configuration is well-structured and comprehensive.This configuration file provides a robust setup for the bacon tool, covering various aspects of the development workflow including checking, linting, testing, documentation, and running the application. The file is well-organized and includes helpful comments throughout.
Key strengths:
- Clear separation of different job types (check, clippy, test, doc, run).
- Use of appropriate tools like nextest for improved test performance.
- Thoughtful configuration of documentation jobs.
- Flexible setup for running the application and examples.
- Custom keybindings for quick access to common tasks.
The suggestions provided in the review comments are minor improvements that could further enhance the configuration's clarity and effectiveness. These include considering stdout handling, documenting lint customization processes, explaining package exclusions, and adding a keybinding for the test job.
Overall, this bacon.toml file serves as a solid foundation for managing the project's development workflow.
11-18
: LGTM! Consider the implications of suppressing stdout.The "check" and "check-all" job configurations are well-defined and use appropriate cargo commands.
While suppressing stdout can reduce noise, it might hide useful information. Consider if there are scenarios where stdout output would be beneficial. You can verify the impact with this script:
19-46
: LGTM! Consider enabling stdout for Clippy and documenting lint customization process.The Clippy job configurations are well-structured and provide options for running on default and all targets.
- Consider enabling stdout for Clippy jobs to capture all warnings and suggestions:
-need_stdout = false +need_stdout = true
- The commented-out examples for disabling lints are helpful. Consider adding a brief comment on when and how to customize lints for your project, or link to a lint policy document if one exists.
To ensure Clippy isn't suppressing important information, run this script:
README.md (3)
32-32
: LGTM! New crate description added.The addition of the
augurs-prophet
crate description is well-formatted and consistent with other entries in the table. The "alpha" status is appropriate for a new crate.
84-84
: LGTM! Reference link foraugurs-prophet
added.The new reference link for
augurs-prophet
is correctly formatted and consistent with other crate reference links in the document. It points to the appropriate crates.io page for the new crate.
88-89
: LGTM! Reference links forjust
andbacon
added.The new reference links for
just
andbacon
are correctly formatted and point to the appropriate documentation pages for each tool.Note: As mentioned in a previous review comment, the
bacon
reference is currently unused in the document. Please refer to the earlier suggestion to update the mention ofbacon
in the "Developing" section to use this reference link.crates/augurs-prophet/LICENSE-APACHE (1)
Line range hint
1-220
: LGTM: Apache License 2.0 added to the project.The inclusion of the Apache License 2.0 is appropriate for open-source projects. It clearly defines the terms and conditions for using, modifying, and distributing the software, which is crucial for legal compliance and user understanding.
crates/augurs-prophet/src/lib.rs (4)
1-7
: Crate-level documentation and compiler warnings are well-configuredIncluding the README as crate-level documentation and setting up compiler warnings helps maintain code quality and provides users with essential information.
8-18
: Module organization is clean and extensibleThe modules are well-organized, and publicly exporting the
optimizer
module allows users to implement custom optimizers, enhancing the flexibility of the library.
20-21
: Type aliasTimestampSeconds
is properly documentedDefining
TimestampSeconds
with clear documentation improves code readability and consistency when handling timestamps.
23-37
: Re-exporting key types simplifies the library interfaceRe-exporting essential types and structures at the root level ensures that users have convenient access without navigating through the module hierarchy.
crates/augurs-prophet/src/testdata.rs (2)
16-16
: Confirmand_hms_opt(0, 0, 0).unwrap()
is safe in this contextIn the
timestamp
method, usingunwrap()
onand_hms_opt(0, 0, 0)
assumes that the operation will always succeed. Given that the hours, minutes, and seconds are hardcoded to valid values, this should be safe. However, if there's any possibility ofNaiveDate
being invalid or manipulated, consider handling theNone
case to prevent potential panics.You mentioned in previous learnings that it's safe to use
unwrap()
here due to hardcoded values. If you're confident this function will not receive invalid dates, no action is required.
33-46
: Optional: Handle potential errors when loading CSV dataThe
load_csv
function uses multipleunwrap()
calls that could cause panics if the file is missing or the data is malformed. Since this is test code with hardcoded data, and based on previous learnings, it's acceptable in this context. However, if you plan to use this function with external or dynamically provided data in the future, consider adding proper error handling.crates/augurs-prophet/src/error.rs (3)
31-32
: Fix the syntax in the error message forTooManyDataPoints
.The
#[error]
macro cannot evaluate expressions likei32::MAX
inside format strings. This issue was identified in a previous review and still needs to be addressed.
83-84
: Limit the amount of data included in theUnableToInferFrequency
error message.Including the entire vector of timestamps
{0:?}
in the error message could lead to excessively large messages for large datasets. Consider summarizing the timestamps or displaying a range instead.
6-85
: Well-defined error variants and messages.The
Error
enum is well-structured, and the error messages are clear and informative, providing helpful context for users of theProphet
forecasting algorithm.crates/augurs-prophet/src/util.rs (1)
1-2
: 🛠️ Refactor suggestionVerify Necessity of
FromPrimitive
in Trait BoundsIt's worth verifying whether the
FromPrimitive
trait is strictly necessary in the trait bounds, as this may already be covered by theFloat
trait.To check if
FromPrimitive
is required, you can search for usages ofFromPrimitive
methods:If
FromPrimitive
is only used implicitly through theFloat
trait, you might be able to simplify the trait bounds by removing it.✅ Verification successful
Confirmed Use of
FromPrimitive
in Trait BoundsThe
FromPrimitive
trait is explicitly used in the trait bounds withincrates/augurs-prophet/src/util.rs
, ensuring necessary functionality that is not covered by theFloat
trait alone.
- File: crates/augurs-prophet/src/util.rs
- Trait
FloatIterExt
and its implementation includeFromPrimitive
alongsideFloat
.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for direct usages of `FromPrimitive` methods in the codebase. rg --type rust 'FromPrimitive' crates/augurs-prophet/src/Length of output: 356
crates/augurs-prophet/src/features.rs (2)
42-53
: Consider adding an overload method to accept a singlei32
forwith_lower_window
A previous review comment suggested adding an overload method that accepts a single
i32
value forwith_lower_window
. This allows users to set the samelower_window
value for all dates without providing aVec<i32>
matching the length ofds
. This comment is still valid and applicable.
61-72
: Consider adding an overload method to accept a singlei32
forwith_upper_window
A previous review comment suggested adding an overload method that accepts a single
i32
value forwith_upper_window
. This allows users to set the sameupper_window
value for all dates without providing aVec<i32>
matching the length ofds
. This comment is still valid and applicable.crates/augurs-prophet/src/data.rs (1)
115-133
: 🛠️ Refactor suggestionCheck for infinite values in regressor data.
Currently, the
with_regressors
methods check forNaN
values but do not check for infinite values (f64::INFINITY
orf64::NEG_INFINITY
). To ensure data integrity, consider adding checks for infinite values as well.Update the condition to check for both
NaN
and infinite values:if reg.iter().any(|x| x.is_nan() || x.is_infinite()) { return Err(Error::InvalidValue { column: name.clone(), }); }You'll also need to update the
Error
enum to reflect a more general invalid value:pub enum Error { // existing variants... - NaNValue { + InvalidValue { column: String, }, // ... }Also applies to: 281-299
crates/augurs-prophet/src/prophet/options.rs (1)
72-85
: Good use of#[non_exhaustive]
for future-proofingEstimationMode
Marking the
EstimationMode
enum asnon_exhaustive
anticipates future additions such as MCMC sampling. This is a proactive approach that enhances the extensibility of the code.crates/augurs-prophet/src/prophet/predict.rs (1)
555-576
:⚠️ Potential issuePotential Licensing Issue with Copied Code in
percentile_of_sorted
FunctionAt lines 555-556, the function
percentile_of_sorted
is copied from the Rust compiler's test suite:// Taken from the Rust compiler's test suite: // https://github.com/rust-lang/rust/blob/917b0b6c70f078cb08bbb0080c9379e4487353c3/library/test/src/stats.rs#L258-L280.Copying code from external projects may introduce licensing issues, depending on the licenses of the Rust project and your project's license. The Rust compiler is dual-licensed under Apache 2.0 and MIT licenses.
Please ensure that including this code complies with your project's licensing requirements.
To verify the licensing terms of the Rust compiler's test suite, you can run the following script:
This script clones the Rust repository and displays the
LICENSE-APACHE
andLICENSE-MIT
files. Review these licenses to ensure compliance. If needed, I can help rewrite thepercentile_of_sorted
function to avoid potential licensing issues.✅ Verification successful
Licensing Compliance Verified
Including the
percentile_of_sorted
function from the Rust compiler's test suite complies with the project's licensing requirements under both the Apache License 2.0 and the MIT License.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check the licensing terms of the Rust compiler's test suite. # Clone the Rust repository (depth limited to save time). git clone https://github.com/rust-lang/rust.git --depth 1 rust_repo # Display the LICENSE files. echo "Apache License 2.0:" cat rust_repo/LICENSE-APACHE echo "MIT License:" cat rust_repo/LICENSE-MITLength of output: 13960
crates/augurs-prophet/src/prophet/prep.rs (3)
248-265
: Consider handling the case whenhistory.y_scaled
is emptyIn the data preparation for modeling,
history.y_scaled
is used directly. Ify_scaled
is empty, which might occur in certain edge cases, this could lead to issues during model fitting.Ensure that
history.y_scaled
is checked for non-emptiness before proceeding. If necessary, add error handling to manage cases with insufficient data.
338-343
: Check for division by zero when calculatingcs
In the loop where
cs
is calculated, there is potential for division by zero ifscales.y_scale
is zero. Althoughscales.y_scale
is adjusted earlier, it's safer to ensure that division by zero cannot occur here.Run the following script to confirm that
scales.y_scale
is never zero before this calculation:✅ Verification successful
Division by zero for
scales.y_scale
is properly handledThe code includes a check that sets
scales.y_scale
to1.0
if it is zero before performing the division, effectively preventing any division by zero errors.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that scales.y_scale is not zero when calculating cs. # Test: Search for assignments to scales.y_scale and ensure it's not zero. rg --type rust --pcre2 "scales\.y_scale\s*=\s*[^;]+" -A 3 crates/augurs-prophet/src/prophet/prep.rsLength of output: 971
308-322
: Ensure regressors and conditions are sorted correctlyWhile sorting the data,
ds
andy
are sorted based onds
, but the sorting of regressors and seasonality conditions relies on the order ofsort_indices
. It's crucial to ensure that all related data structures are correctly aligned after sorting.Run the following script to verify that all regressors and conditions are sorted consistently with
ds
:✅ Verification successful
Regressors and conditions are sorted correctly
Verified that regressors and seasonality conditions are sorted consistently with
ds
.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that all regressors and conditions are sorted according to ds. # Test: Check for usage of sort_indices in sorting regressors and conditions. rg --type rust "for .* in .*\.values_mut\(\)" -A 5 crates/augurs-prophet/src/prophet/prep.rsLength of output: 427
fn infer_freq(history_dates: &[TimestampSeconds]) -> Result<TimestampSeconds, Error> { | ||
const INFER_N: usize = 5; | ||
let get_tried = || { | ||
history_dates | ||
.iter() | ||
.rev() | ||
.take(INFER_N) | ||
.copied() | ||
.collect_vec() | ||
}; | ||
// Calculate diffs between the last 5 dates in the history, and | ||
// create a map from diffs to counts. | ||
let diff_counts = history_dates | ||
.iter() | ||
.rev() | ||
.take(INFER_N) | ||
.tuple_windows() | ||
.map(|(a, b)| a - b) | ||
.counts(); | ||
// Find the max count, and return the corresponding diff, provided there | ||
// is exactly one diff with that count. | ||
let max = diff_counts | ||
.values() | ||
.copied() | ||
.max() | ||
.ok_or_else(|| Error::UnableToInferFrequency(get_tried()))?; | ||
diff_counts | ||
.into_iter() | ||
.filter(|(_, v)| *v == max) | ||
.map(|(k, _)| k) | ||
.exactly_one() | ||
.map_err(|_| Error::UnableToInferFrequency(get_tried())) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add input validation for sufficient data points in infer_freq
The infer_freq
function may fail or return incorrect results when history_dates
has fewer than two data points. Consider adding input validation to ensure that the dataset has enough points to infer the frequency reliably.
Apply this diff to add an explicit check:
fn infer_freq(history_dates: &[TimestampSeconds]) -> Result<TimestampSeconds, Error> {
+ if history_dates.len() < 2 {
+ return Err(Error::NotEnoughData);
+ }
const INFER_N: usize = 5;
let get_tried = || {
history_dates
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
fn infer_freq(history_dates: &[TimestampSeconds]) -> Result<TimestampSeconds, Error> { | |
const INFER_N: usize = 5; | |
let get_tried = || { | |
history_dates | |
.iter() | |
.rev() | |
.take(INFER_N) | |
.copied() | |
.collect_vec() | |
}; | |
// Calculate diffs between the last 5 dates in the history, and | |
// create a map from diffs to counts. | |
let diff_counts = history_dates | |
.iter() | |
.rev() | |
.take(INFER_N) | |
.tuple_windows() | |
.map(|(a, b)| a - b) | |
.counts(); | |
// Find the max count, and return the corresponding diff, provided there | |
// is exactly one diff with that count. | |
let max = diff_counts | |
.values() | |
.copied() | |
.max() | |
.ok_or_else(|| Error::UnableToInferFrequency(get_tried()))?; | |
diff_counts | |
.into_iter() | |
.filter(|(_, v)| *v == max) | |
.map(|(k, _)| k) | |
.exactly_one() | |
.map_err(|_| Error::UnableToInferFrequency(get_tried())) | |
} | |
fn infer_freq(history_dates: &[TimestampSeconds]) -> Result<TimestampSeconds, Error> { | |
if history_dates.len() < 2 { | |
return Err(Error::NotEnoughData); | |
} | |
const INFER_N: usize = 5; | |
let get_tried = || { | |
history_dates | |
.iter() | |
.rev() | |
.take(INFER_N) | |
.copied() | |
.collect_vec() | |
}; | |
// Calculate diffs between the last 5 dates in the history, and | |
// create a map from diffs to counts. | |
let diff_counts = history_dates | |
.iter() | |
.rev() | |
.take(INFER_N) | |
.tuple_windows() | |
.map(|(a, b)| a - b) | |
.counts(); | |
// Find the max count, and return the corresponding diff, provided there | |
// is exactly one diff with that count. | |
let max = diff_counts | |
.values() | |
.copied() | |
.max() | |
.ok_or_else(|| Error::UnableToInferFrequency(get_tried()))?; | |
diff_counts | |
.into_iter() | |
.filter(|(_, v)| *v == max) | |
.map(|(k, _)| k) | |
.exactly_one() | |
.map_err(|_| Error::UnableToInferFrequency(get_tried())) | |
} |
// prediction data. | ||
if let Some(train_holidays) = &self.train_holiday_names { | ||
all_holidays.retain(|name, _| train_holidays.contains(name)); | ||
let holidays_to_add = train_holidays | ||
.iter() | ||
.filter(|name| !all_holidays.contains_key(name.as_str())) | ||
.collect_vec(); | ||
all_holidays.extend( | ||
holidays_to_add | ||
.into_iter() | ||
.map(|name| (name.clone(), Holiday::new(vec![]))), | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure consistent handling of holidays during prediction
In construct_holidays
, when predicting, holidays not seen during training are removed, and holidays seen in training but not in prediction data are added with empty dates. This could lead to unexpected behavior if the holiday definitions differ between training and prediction.
Consider adding checks to ensure that holiday definitions are consistent between training and prediction, and provide warnings or errors if discrepancies are found.
new_features: impl Iterator<Item = Vec<f64>>, | ||
) { | ||
self.names.extend(names); | ||
self.data.extend(new_features); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate alignment of feature names and data
In the extend
method of FeaturesFrame
, it's important to ensure that the iterators names
and new_features
produce the same number of items to maintain alignment between feature names and their corresponding data.
Consider adding a check to verify that both iterators have the same length, or handle cases where they might differ.
fn extend(
&mut self,
names: impl Iterator<Item = FeatureName>,
new_features: impl Iterator<Item = Vec<f64>>,
) {
let names_vec: Vec<_> = names.collect();
let features_vec: Vec<_> = new_features.collect();
assert_eq!(names_vec.len(), features_vec.len(), "Mismatch in number of feature names and data columns");
self.names.extend(names_vec);
self.data.extend(features_vec);
}
if scales.y_scale == 0.0 { | ||
scales.y_scale = 1.0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle zero y_scale
explicitly
In the initialize_scales
method, when scales.y_scale
is zero, it's being set to 1.0
. This could mask underlying data issues and lead to incorrect scaling of y
values. Instead, consider returning an error to handle this case explicitly.
Apply this diff to return an error when scales.y_scale
is zero:
if scales.y_scale == 0.0 {
- scales.y_scale = 1.0;
+ return Err(Error::ZeroYScale);
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if scales.y_scale == 0.0 { | |
scales.y_scale = 1.0; | |
} | |
if scales.y_scale == 0.0 { | |
return Err(Error::ZeroYScale); | |
} |
// Handle conditions by setting feature values to zero where the condition is false. | ||
if let Some(condition_name) = &seasonality.condition_name { | ||
let col = history | ||
.seasonality_conditions | ||
.get(condition_name) | ||
.ok_or_else(|| Error::MissingSeasonalityCondition(condition_name.clone()))?; | ||
let false_indices = col | ||
.iter() | ||
.enumerate() | ||
.filter_map(|(i, b)| if *b { None } else { Some(i) }) | ||
.collect_vec(); | ||
for feature in new_features.iter_mut() { | ||
for idx in false_indices.iter().copied() { | ||
feature[idx] = 0.0; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimize condition handling in seasonality features
In make_seasonality_features
, the nested loops used to set feature values to zero based on conditions can be optimized for better performance, especially with large datasets. Instead of iterating over features and indices, consider using iterators and masking techniques to avoid nested loops.
Apply this diff to refactor the condition handling:
if let Some(condition_name) = &seasonality.condition_name {
let col = history
.seasonality_conditions
.get(condition_name)
.ok_or_else(|| Error::MissingSeasonalityCondition(condition_name.clone()))?;
- let false_indices = col
- .iter()
- .enumerate()
- .filter_map(|(i, b)| if *b { None } else { Some(i) })
- .collect_vec();
- for feature in new_features.iter_mut() {
- for idx in false_indices.iter().copied() {
- feature[idx] = 0.0;
- }
- }
+ for (feature_row, &include) in feature.iter_mut().zip(col) {
+ if !include {
+ *feature_row = 0.0;
+ }
+ }
}
Committable suggestion was skipped due to low confidence.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
This makes it easier to pass by reference, since otherwise we're using Vec<Vec<T>> which derefs to &[Vec<T>] not &[&[T]], and they have different representations so can't be transmuted.
To make it easier to cast slices of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
🧹 Outside diff range and nitpick comments (8)
crates/augurs-prophet/Cargo.toml (3)
11-18
: LGTM: Dependencies are well-defined and use workspace versions.The dependencies are correctly specified and use workspace-level versions, which is a good practice for maintaining consistency across the workspace. The optional
bytemuck
dependency with the "derive" feature is appropriately configured.Consider adding brief comments explaining the purpose of each dependency, especially for less common ones like
statrs
orbytemuck
. This can help other developers understand the role of each dependency in the project.
25-26
: LGTM: Feature for optional dependency is correctly defined.The "bytemuck" feature is properly defined, allowing users to opt-in to this functionality. This is consistent with the optional
bytemuck
dependency and follows Rust best practices for managing optional dependencies.Consider adding a brief comment explaining the purpose of the "bytemuck" feature and when users might want to enable it. This can help users of your crate make informed decisions about which features to enable.
1-26
: Overall, excellent Cargo.toml configuration for the augurs-prophet crate.The Cargo.toml file is well-structured, follows Rust best practices, and makes good use of workspace-level definitions. The dependencies, dev-dependencies, and features are appropriately defined. The optional
bytemuck
dependency and corresponding feature demonstrate good project organization.As the project grows, consider:
- Documenting the purpose of each dependency and feature in comments.
- Regularly reviewing and updating dependencies to ensure security and performance.
- If not already in place, consider setting up a workspace-wide version management strategy (e.g., using a tool like
cargo-edit
orcargo-update
) to simplify dependency updates across all crates in the workspace.crates/augurs-prophet/src/optimizer.rs (3)
1-23
: Enhance module-level documentation for clarity and completenessThe current documentation provides a good overview, but consider the following improvements:
- Add a brief explanation of what the Prophet model is and its purpose.
- Clarify the current state of implementation (e.g., which features are currently available).
- Consider moving TODOs to a separate section or issue tracker for better visibility and tracking.
These changes will help users and contributors better understand the module's current state and future direction.
52-84
: Adhere to Rust naming conventions for struct field namesThe
Data
struct contains fields likeT
,S
, andK
that are capitalized. According to Rust naming conventions, field names should be insnake_case
. Consider renaming these fields to improve code consistency and readability.Example:
pub struct Data { pub num_time_periods: u32, // instead of T pub num_changepoints: u32, // instead of S pub num_regressors: u32, // instead of K // ... other fields ... }Also, remove the
#[allow(non_snake_case)]
attribute once the field names are updated.
185-196
: Enhance documentation for the Optimizer traitWhile the
Optimizer
trait is well-defined, consider adding more detailed documentation to improve its usability:
- Provide a brief explanation of what the optimization process does in the context of the Prophet model.
- Document the expected behavior when optimization fails or encounters issues.
- Add examples of how to implement this trait for different optimization algorithms.
Example:
/// A type that can run maximum likelihood estimation optimization /// for the Prophet model. /// /// Implementations of this trait should provide methods to find /// the optimal parameters for the Prophet model given initial /// parameters and data. /// /// # Examples /// /// ``` /// // Add an example of how to implement and use this trait /// ``` pub trait Optimizer: std::fmt::Debug { // ... existing method ... }This additional documentation will help users understand how to implement and use the
Optimizer
trait effectively.crates/augurs-prophet/src/prophet.rs (1)
271-274
: Consider creating an issue to track MCMC sampling implementation.The TODO comment suggests implementing MCMC sampling for the
EstimationMode::Mcmc
case. This is an important enhancement that would expand the model's capabilities. Consider creating a GitHub issue to track this task and ensure it's not overlooked in future development.Would you like me to create a GitHub issue for tracking the MCMC sampling implementation?
crates/augurs-prophet/src/prophet/prep.rs (1)
21-120
: Consider adding more documentation to structs and enumsWhile the structs and enums are well-defined, adding more comprehensive documentation would improve the code's readability and maintainability. Consider adding doc comments (starting with
///
) to explain the purpose and usage of each struct and enum, especially for complex types likeComponentColumns
andFeatureName
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
- Cargo.toml (2 hunks)
- crates/augurs-prophet/Cargo.toml (1 hunks)
- crates/augurs-prophet/src/optimizer.rs (1 hunks)
- crates/augurs-prophet/src/positive_float.rs (1 hunks)
- crates/augurs-prophet/src/prophet.rs (1 hunks)
- crates/augurs-prophet/src/prophet/prep.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/augurs-prophet/src/positive_float.rs
🧰 Additional context used
🔇 Additional comments (10)
crates/augurs-prophet/Cargo.toml (2)
1-10
: LGTM: Package metadata is well-defined and follows best practices.The package metadata is correctly set up, using workspace-level definitions for most fields. This approach ensures consistency across the workspace and simplifies maintenance.
20-23
: LGTM: Dev-dependencies are appropriate and use workspace versions.The dev-dependencies are well-chosen for a forecasting library, including testing utilities and a date/time library. Using workspace-level versions maintains consistency across the project.
Cargo.toml (2)
34-34
: LGTM: Addition of augurs-prophet dependencyThe addition of the
augurs-prophet
dependency is consistent with the PR objective of implementing the Prophet forecasting model. The version (0.3.1) matches the workspace version, and the path correctly points to a local crate.
56-56
: Great addition: pretty_assertions for improved testingThe addition of
pretty_assertions = "1.4.1"
is a positive change. This library provides more readable output for assert comparisons in tests, which can significantly improve the debugging experience when tests fail.This addition demonstrates a commitment to maintaining and improving the project's test suite, which is commendable.
crates/augurs-prophet/src/prophet.rs (4)
1-18
: LGTM: Well-structured module and import declarations.The module structure and imports are well-organized, providing a clear separation of concerns for the Prophet time series forecasting implementation.
20-68
: Acknowledge the existing TODO for struct refactoring.The TODO comment on lines 32-35 suggests moving several fields into a separate struct. This refactoring could indeed help minimize the number of optional fields and reduce the risk of invalid states, improving the overall structure and maintainability of the code.
290-813
: LGTM: Comprehensive test coverage.The test modules provide extensive coverage of various aspects of the Prophet implementation, including trend initialization, seasonality, holidays, and fitting. The use of mock optimizers for testing is a good practice, allowing for isolated testing of the Prophet implementation.
231-263
: 🛠️ Refactor suggestionAdd input validation for sufficient data points in
infer_freq
.The
infer_freq
function may fail or return incorrect results whenhistory_dates
has fewer than two data points. Consider adding input validation to ensure that the dataset has enough points to infer the frequency reliably.Apply this diff to add an explicit check:
fn infer_freq(history_dates: &[TimestampSeconds]) -> Result<TimestampSeconds, Error> { + if history_dates.len() < 2 { + return Err(Error::NotEnoughData); + } const INFER_N: usize = 5; let get_tried = || { history_datesLikely invalid or redundant comment.
crates/augurs-prophet/src/prophet/prep.rs (2)
1-20
: Imports and constants look good!The imports are well-organized, and the constants for time periods are clearly defined with appropriate names.
1-1486
: Overall assessment: Good implementation with room for improvementThe
prep.rs
file provides a comprehensive implementation of preprocessing logic for the Prophet forecasting model. The code is generally well-structured and follows Rust best practices. However, there are several areas where improvements can be made:
Code organization: Some methods, particularly
preprocess
,setup_dataframe
, andinitialize_scales
, could benefit from being broken down into smaller, more focused helper methods.Error handling: Consider enhancing error checking, especially for infinite and NaN values in input data.
Documentation: Add more comprehensive documentation to structs, enums, and complex methods to improve code readability and maintainability.
Test coverage: Enhance the test suite with more organized test modules, additional error cases, edge cases, and parameterized tests.
Performance: Some calculations, like the standard deviation for regressors, could be optimized for better accuracy.
Implementing these suggestions would significantly improve the overall quality, maintainability, and robustness of the code.
#[cfg(test)] | ||
pub mod mock_optimizer { | ||
use std::cell::RefCell; | ||
|
||
use super::*; | ||
|
||
#[derive(Debug, Clone)] | ||
pub(crate) struct OptimizeCall { | ||
pub init: InitialParams, | ||
pub data: Data, | ||
pub _opts: OptimizeOpts, | ||
} | ||
|
||
/// A mock optimizer that records the optimization call. | ||
#[derive(Clone, Debug)] | ||
pub(crate) struct MockOptimizer { | ||
/// The optimization call. | ||
/// | ||
/// This will be updated by the mock optimizer when | ||
/// [`Optimizer::optimize`] is called. | ||
// [`Optimizer::optimize`] takes self by reference, | ||
// so we need to store the call in a RefCell. | ||
pub call: RefCell<Option<OptimizeCall>>, | ||
} | ||
|
||
impl MockOptimizer { | ||
/// Create a new mock optimizer. | ||
pub(crate) fn new() -> Self { | ||
Self { | ||
call: RefCell::new(None), | ||
} | ||
} | ||
|
||
/// Take the optimization call out of the mock. | ||
pub(crate) fn take_call(&self) -> Option<OptimizeCall> { | ||
self.call.borrow_mut().take() | ||
} | ||
} | ||
|
||
impl Optimizer for MockOptimizer { | ||
fn optimize( | ||
&self, | ||
init: &InitialParams, | ||
data: &Data, | ||
opts: &OptimizeOpts, | ||
) -> Result<OptimizedParams, Error> { | ||
*self.call.borrow_mut() = Some(OptimizeCall { | ||
init: init.clone(), | ||
data: data.clone(), | ||
_opts: opts.clone(), | ||
}); | ||
Ok(OptimizedParams { | ||
k: init.k, | ||
m: init.m, | ||
sigma_obs: init.sigma_obs, | ||
delta: init.delta.clone(), | ||
beta: init.beta.clone(), | ||
trend: Vec::new(), | ||
}) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance MockOptimizer for more realistic testing scenarios
The current MockOptimizer
implementation is a good start, but consider the following improvements to make it more versatile for testing:
- Add an option to simulate optimization failure by returning an
Error
in certain conditions. - Implement a way to set predefined
OptimizedParams
to be returned, allowing for more controlled test scenarios. - Consider adding a delay option to simulate long-running optimizations.
Example enhancement:
impl MockOptimizer {
pub fn with_result(result: Result<OptimizedParams, Error>) -> Self {
Self {
call: RefCell::new(None),
result: RefCell::new(result),
}
}
// ... other methods ...
}
impl Optimizer for MockOptimizer {
fn optimize(&self, /* ... */) -> Result<OptimizedParams, Error> {
// ... existing code ...
self.result.borrow().clone()
}
}
These enhancements will allow for more comprehensive testing of code that uses the Optimizer
trait.
pub fn predict(&self, data: impl Into<Option<PredictionData>>) -> Result<Predictions, Error> { | ||
let Self { | ||
processed: Some(processed), | ||
optimized: Some(params), | ||
changepoints_t: Some(changepoints_t), | ||
scales: Some(scales), | ||
.. | ||
} = self | ||
else { | ||
return Err(Error::ModelNotFit); | ||
}; | ||
let data = data.into(); | ||
let df = data | ||
.map(|data| { | ||
let training_data = TrainingData { | ||
n: data.n, | ||
ds: data.ds.clone(), | ||
y: vec![], | ||
cap: data.cap.clone(), | ||
floor: data.floor.clone(), | ||
seasonality_conditions: data.seasonality_conditions.clone(), | ||
x: data.x.clone(), | ||
}; | ||
self.setup_dataframe(training_data, Some(scales.clone())) | ||
.map(|(df, _)| df) | ||
}) | ||
.transpose()? | ||
.unwrap_or_else(|| processed.history.clone()); | ||
|
||
let mut trend = self.predict_trend( | ||
&df.t, | ||
&df.cap, | ||
&df.floor, | ||
changepoints_t, | ||
params, | ||
scales.y_scale, | ||
)?; | ||
let features = self.make_all_features(&df)?; | ||
let seasonal_components = self.predict_features(&features, params, scales.y_scale)?; | ||
|
||
let yhat_point = izip!( | ||
&trend.point, | ||
&seasonal_components.additive.point, | ||
&seasonal_components.multiplicative.point | ||
) | ||
.map(|(t, a, m)| t * (1.0 + m) + a) | ||
.collect(); | ||
let mut yhat = FeaturePrediction { | ||
point: yhat_point, | ||
lower: None, | ||
upper: None, | ||
}; | ||
|
||
if self.opts.uncertainty_samples > 0 { | ||
self.predict_uncertainty( | ||
&df, | ||
&features, | ||
params, | ||
changepoints_t, | ||
&mut yhat, | ||
&mut trend, | ||
scales.y_scale, | ||
)?; | ||
} | ||
|
||
Ok(Predictions { | ||
ds: df.ds, | ||
yhat, | ||
trend, | ||
cap: df.cap, | ||
floor: scales.logistic_floor.then_some(df.floor), | ||
additive: seasonal_components.additive, | ||
multiplicative: seasonal_components.multiplicative, | ||
holidays: seasonal_components.holidays, | ||
seasonalities: seasonal_components.seasonalities, | ||
regressors: seasonal_components.regressors, | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider refactoring the predict
method for improved maintainability and performance.
The predict
method is well-structured but quite complex. Consider the following improvements:
- Error handling: Implement more specific error types for different failure scenarios, rather than using a generic
ModelNotFit
error. - Performance: Evaluate if creating a new
TrainingData
struct for each prediction can be optimized or pre-computed. - Readability: Break down the method into smaller, well-named helper functions to improve clarity and maintainability.
These changes could enhance the method's robustness, performance, and readability.
Would you like assistance in implementing these suggestions?
pub fn make_future_dataframe( | ||
&self, | ||
horizon: NonZeroU32, | ||
include_history: IncludeHistory, | ||
) -> Result<PredictionData, Error> { | ||
let Some(Preprocessed { history_dates, .. }) = &self.processed else { | ||
return Err(Error::ModelNotFit); | ||
}; | ||
let freq = Self::infer_freq(history_dates)?; | ||
let last_date = *history_dates.last().ok_or(Error::NotEnoughData)?; | ||
let n = (horizon.get() as u64 + 1) as TimestampSeconds; | ||
let dates = (last_date..last_date + n * freq) | ||
.step_by(freq as usize) | ||
.filter(|ds| *ds > last_date) | ||
.take(horizon.get() as usize); | ||
|
||
let ds = if include_history == IncludeHistory::Yes { | ||
history_dates.iter().copied().chain(dates).collect() | ||
} else { | ||
dates.collect() | ||
}; | ||
Ok(PredictionData::new(ds)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance input validation and error handling in make_future_dataframe
.
The make_future_dataframe
method is well-implemented, including the frequency inference. Consider the following improvements:
- Input validation: Add checks to ensure that
horizon
is reasonable (e.g., not too large) to prevent potential issues with very long forecasts. - Error handling: The
infer_freq
method could potentially return an error. Consider how to handle this case explicitly.
These enhancements would improve the robustness of the method.
Would you like assistance in implementing these improvements?
pub(super) fn preprocess(&mut self, mut data: TrainingData) -> Result<Preprocessed, Error> { | ||
let n = data.ds.len(); | ||
if n != data.y.len() { | ||
return Err(Error::MismatchedLengths { | ||
a: n, | ||
a_name: "ds".to_string(), | ||
b: data.y.len(), | ||
b_name: "y".to_string(), | ||
}); | ||
} | ||
if n < 2 { | ||
return Err(Error::NotEnoughData); | ||
} | ||
(data.ds, data.y) = data | ||
.ds | ||
.into_iter() | ||
.zip(data.y) | ||
.filter(|(_, y)| !y.is_nan()) | ||
.unzip(); | ||
|
||
let mut history_dates = data.ds.clone(); | ||
history_dates.sort_unstable(); | ||
|
||
let (history, scales) = self.setup_dataframe(data, None)?; | ||
self.scales = Some(scales); | ||
self.set_auto_seasonalities(&history)?; | ||
let Features { | ||
features, | ||
prior_scales, | ||
modes, | ||
component_columns, | ||
holiday_names, | ||
} = self.make_all_features(&history)?; | ||
self.component_modes = Some(modes); | ||
self.train_holiday_names = Some(holiday_names); | ||
self.train_component_columns = Some(component_columns.clone()); | ||
|
||
let (changepoints, changepoints_t) = self.get_changepoints(&history.ds)?; | ||
self.changepoints = Some(changepoints); | ||
self.changepoints_t = Some(changepoints_t.clone()); | ||
|
||
let cap = if self.opts.growth == GrowthType::Logistic { | ||
history.cap_scaled.clone().ok_or(Error::MissingCap)? | ||
} else { | ||
vec![0.0; n] | ||
}; | ||
|
||
// Transpose X; we store it column-major but Stan expects it a contiguous | ||
// array in row-major format. | ||
// format. | ||
#[allow(non_snake_case)] | ||
let mut X = vec![0.0; features.data.len() * features.data[0].len()]; | ||
for (i, row) in features.data.iter().enumerate() { | ||
for (j, val) in row.iter().enumerate() { | ||
X[i + features.data.len() * j] = *val; | ||
} | ||
} | ||
|
||
let data = Data { | ||
T: history | ||
.ds | ||
.len() | ||
.try_into() | ||
.map_err(|_| Error::TooManyDataPoints(n))?, | ||
S: changepoints_t.len() as i32, | ||
K: features.names.len() as i32, | ||
tau: self.opts.changepoint_prior_scale, | ||
trend_indicator: self.opts.growth.into(), | ||
y: history.y_scaled.clone(), | ||
t: history.t.clone(), | ||
t_change: changepoints_t, | ||
X, | ||
sigmas: prior_scales, | ||
s_a: component_columns.additive, | ||
s_m: component_columns.multiplicative, | ||
cap, | ||
}; | ||
|
||
Ok(Preprocessed { | ||
data, | ||
history, | ||
history_dates, | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider refactoring the preprocess
method for improved readability
The preprocess
method is comprehensive but quite long. To improve readability and maintainability, consider breaking it down into smaller, more focused helper methods.
Here's a suggested refactoring approach:
- Create a helper method for data validation.
- Create a helper method for setting up the dataframe and scales.
- Create a helper method for setting up seasonalities and features.
- Create a helper method for setting up changepoints.
This could look something like:
pub(super) fn preprocess(&mut self, mut data: TrainingData) -> Result<Preprocessed, Error> {
self.validate_data(&data)?;
let (history, scales) = self.setup_dataframe_and_scales(data)?;
self.setup_seasonalities_and_features(&history)?;
let (changepoints, changepoints_t) = self.setup_changepoints(&history)?;
// ... rest of the method ...
}
fn validate_data(&self, data: &TrainingData) -> Result<(), Error> {
// ... validation logic ...
}
fn setup_dataframe_and_scales(&mut self, data: TrainingData) -> Result<(ProcessedData, Scales), Error> {
// ... setup logic ...
}
fn setup_seasonalities_and_features(&mut self, history: &ProcessedData) -> Result<Features, Error> {
// ... seasonality and feature setup logic ...
}
fn setup_changepoints(&self, history: &ProcessedData) -> Result<(Vec<TimestampSeconds>, Vec<f64>), Error> {
// ... changepoint setup logic ...
}
This refactoring would make the preprocess
method more modular and easier to understand and maintain.
mut seasonality_conditions, | ||
mut x, | ||
floor, | ||
cap, | ||
}: TrainingData, | ||
scales: Option<Scales>, | ||
) -> Result<(ProcessedData, Scales), Error> { | ||
if y.iter().any(|y| y.is_infinite()) { | ||
return Err(Error::InfiniteValue { | ||
column: "y".to_string(), | ||
}); | ||
} | ||
for name in self.regressors.keys() { | ||
if !x.contains_key(name) { | ||
return Err(Error::MissingRegressor(name.clone())); | ||
} | ||
// No need to check lengths or inf, we do that in [`TrainingData::with_regressors`]. | ||
} | ||
for Seasonality { condition_name, .. } in self.seasonalities.values() { | ||
if let Some(condition_name) = condition_name { | ||
if !seasonality_conditions.contains_key(condition_name) { | ||
return Err(Error::MissingSeasonalityCondition(condition_name.clone())); | ||
} | ||
// No need to check lengths or inf, we do that in [`TrainingData::with_regressors`]. | ||
} | ||
} | ||
|
||
// Sort everything by date. | ||
let mut sort_indices = (0..n).collect_vec(); | ||
sort_indices.sort_by_key(|i| ds[*i]); | ||
ds.sort(); | ||
// y isn't provided for predictions. | ||
if !y.is_empty() { | ||
y = sort_indices.iter().map(|i| y[*i]).collect(); | ||
} | ||
for condition in seasonality_conditions.values_mut() { | ||
*condition = sort_indices.iter().map(|i| condition[*i]).collect(); | ||
} | ||
for regressor in x.values_mut() { | ||
*regressor = sort_indices.iter().map(|i| regressor[*i]).collect(); | ||
} | ||
|
||
let scales = scales | ||
.map(Ok) | ||
.unwrap_or_else(|| self.initialize_scales(&ds, &y, &x, &floor, &cap))?; | ||
|
||
let floor = if scales.logistic_floor { | ||
floor.ok_or(Error::MissingFloor)? | ||
} else if self.opts.scaling == Scaling::AbsMax { | ||
vec![0.0; ds.len()] | ||
} else { | ||
vec![scales.y_min; ds.len()] | ||
}; | ||
let cap_scaled = if self.opts.growth == GrowthType::Logistic { | ||
let cap = cap.as_ref().ok_or(Error::MissingCap)?; | ||
let mut cap_scaled = Vec::with_capacity(ds.len()); | ||
for (cap, floor) in cap.iter().zip(&floor) { | ||
let cs = (cap - floor) / scales.y_scale; | ||
if cs <= 0.0 { | ||
return Err(Error::CapNotGreaterThanFloor); | ||
} | ||
cap_scaled.push(cs); | ||
} | ||
Some(cap_scaled) | ||
} else { | ||
None | ||
}; | ||
|
||
let t = ds | ||
.iter() | ||
.map(|ds| (ds - scales.start) as f64 / scales.t_scale) | ||
.collect(); | ||
|
||
let y_scaled = y | ||
.iter() | ||
.zip(&floor) | ||
.map(|(y, floor)| (y - floor) / scales.y_scale) | ||
.collect(); | ||
|
||
for (name, regressor) in &scales.regressors { | ||
let col = x | ||
.get_mut(name) | ||
.ok_or_else(|| Error::MissingRegressor(name.clone()))?; | ||
for x in col { | ||
*x = (*x - regressor.mu) / regressor.std; | ||
} | ||
} | ||
|
||
let data = ProcessedData { | ||
ds, | ||
t, | ||
y_scaled, | ||
cap, | ||
cap_scaled, | ||
floor, | ||
regressors: x, | ||
seasonality_conditions, | ||
}; | ||
Ok((data, scales)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor setup_dataframe
for improved readability and error handling
The setup_dataframe
method is quite long and complex. Consider breaking it down into smaller, more focused helper methods to improve readability and maintainability.
Here's a suggested refactoring approach:
- Create a helper method for validating input data.
- Create a helper method for sorting data.
- Create a helper method for scaling data.
- Create a helper method for handling regressors.
This could look something like:
pub(super) fn setup_dataframe(&self, data: TrainingData, scales: Option<Scales>) -> Result<(ProcessedData, Scales), Error> {
self.validate_input_data(&data)?;
let sorted_data = self.sort_data(data);
let scales = scales.unwrap_or_else(|| self.initialize_scales(&sorted_data)?);
let scaled_data = self.scale_data(sorted_data, &scales)?;
let processed_data = self.handle_regressors(scaled_data, &scales)?;
Ok((processed_data, scales))
}
fn validate_input_data(&self, data: &TrainingData) -> Result<(), Error> {
// ... validation logic ...
}
fn sort_data(&self, mut data: TrainingData) -> TrainingData {
// ... sorting logic ...
}
fn scale_data(&self, data: TrainingData, scales: &Scales) -> Result<ProcessedData, Error> {
// ... scaling logic ...
}
fn handle_regressors(&self, mut data: ProcessedData, scales: &Scales) -> Result<ProcessedData, Error> {
// ... regressor handling logic ...
}
This refactoring would make the setup_dataframe
method more modular and easier to understand and maintain.
Additionally, there's an issue with error handling for infinite values:
The current check for infinite values only considers the y
values:
if y.iter().any(|y| y.is_infinite()) {
return Err(Error::InfiniteValue {
column: "y".to_string(),
});
}
Consider extending this check to include NaN
values and also check other relevant columns like regressors:
fn check_infinite_or_nan<T: Float>(values: &[T], column: &str) -> Result<(), Error> {
if values.iter().any(|v| v.is_infinite() || v.is_nan()) {
return Err(Error::InfiniteOrNaNValue {
column: column.to_string(),
});
}
Ok(())
}
// In setup_dataframe:
check_infinite_or_nan(&y, "y")?;
for (name, regressor) in &x {
check_infinite_or_nan(regressor, &format!("regressor `{}`", name))?;
}
This will provide more comprehensive error checking and prevent potential issues with infinite or NaN values in the data.
fn initialize_scales( | ||
&self, | ||
ds: &[TimestampSeconds], | ||
y: &[f64], | ||
regressors: &HashMap<String, Vec<f64>>, | ||
floor: &Option<Vec<f64>>, | ||
cap: &Option<Vec<f64>>, | ||
) -> Result<Scales, Error> { | ||
let mut scales = Scales::default(); | ||
match floor.as_ref() { | ||
Some(floor) if self.opts.growth == GrowthType::Logistic => { | ||
scales.logistic_floor = true; | ||
match (self.opts.scaling, cap) { | ||
(Scaling::AbsMax, _) => { | ||
let MinMaxResult::MinMax(y_min, y_scale) = y | ||
.iter() | ||
.zip(floor) | ||
.map(|(y, floor)| (y - floor).abs()) | ||
.minmax_by(|a, b| { | ||
a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal) | ||
}) | ||
else { | ||
return Err(Error::AbsMaxScalingFailed); | ||
}; | ||
scales.y_min = y_min; | ||
scales.y_scale = y_scale; | ||
} | ||
(Scaling::MinMax, Some(cap)) => { | ||
scales.y_min = floor.iter().copied().nanmin(true); | ||
scales.y_scale = cap.iter().copied().nanmax(true) - scales.y_min; | ||
} | ||
_ => { | ||
return Err(Error::MissingCap); | ||
} | ||
} | ||
} | ||
Some(_) | None => match self.opts.scaling { | ||
Scaling::AbsMax => { | ||
scales.y_min = 0.0; | ||
scales.y_scale = y.iter().map(|y| y.abs()).nanmax(true); | ||
} | ||
Scaling::MinMax => { | ||
scales.y_min = y.iter().copied().nanmin(true); | ||
scales.y_scale = y.iter().copied().nanmax(true) - scales.y_min; | ||
} | ||
}, | ||
}; | ||
if scales.y_scale == 0.0 { | ||
scales.y_scale = 1.0; | ||
} | ||
|
||
scales.start = *ds.first().ok_or(Error::NotEnoughData)?; | ||
scales.t_scale = (*ds.last().ok_or(Error::NotEnoughData)? - scales.start) as f64; | ||
|
||
for (name, regressor) in self.regressors.iter() { | ||
// Standardize if requested. | ||
let col = regressors | ||
.get(name) | ||
.ok_or(Error::MissingRegressor(name.clone()))?; | ||
// If there are 2 or fewer unique values, don't standardize. | ||
let mut vals = col.to_vec(); | ||
vals.sort_unstable_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)); | ||
vals.dedup(); | ||
let mut regressor_scale = RegressorScale::default(); | ||
if vals.len() >= 2 { | ||
if regressor.standardize == Standardize::Auto { | ||
regressor_scale.standardize = | ||
!(vals.len() == 2 && vals[0] == 0.0 && vals[1] == 1.0); | ||
} | ||
if regressor_scale.standardize { | ||
let mean = vals.iter().sum::<f64>() / vals.len() as f64; | ||
let variance = | ||
vals.iter().map(|x| (x - mean).powi(2)).sum::<f64>() / vals.len() as f64; | ||
let std = variance.sqrt(); | ||
regressor_scale.mu = mean; | ||
regressor_scale.std = std; | ||
} | ||
} | ||
scales.regressors.insert(name.clone(), regressor_scale); | ||
} | ||
Ok(scales) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor initialize_scales
for improved readability and correct standard deviation calculation
The initialize_scales
method is complex and could benefit from refactoring to improve readability. Additionally, there's an issue with the standard deviation calculation for regressors.
Consider breaking down the method into smaller, more focused helper methods. For example:
- Create a helper method for initializing y-scale parameters.
- Create a helper method for initializing t-scale parameters.
- Create a helper method for handling regressor scaling.
This could look like:
fn initialize_scales(&self, ds: &[TimestampSeconds], y: &[f64], regressors: &HashMap<String, Vec<f64>>, floor: &Option<Vec<f64>>, cap: &Option<Vec<f64>>) -> Result<Scales, Error> {
let mut scales = Scales::default();
self.initialize_y_scale(&mut scales, y, floor, cap)?;
self.initialize_t_scale(&mut scales, ds)?;
self.initialize_regressor_scales(&mut scales, regressors)?;
Ok(scales)
}
fn initialize_y_scale(&self, scales: &mut Scales, y: &[f64], floor: &Option<Vec<f64>>, cap: &Option<Vec<f64>>) -> Result<(), Error> {
// ... y-scale initialization logic ...
}
fn initialize_t_scale(&self, scales: &mut Scales, ds: &[TimestampSeconds]) -> Result<(), Error> {
// ... t-scale initialization logic ...
}
fn initialize_regressor_scales(&self, scales: &mut Scales, regressors: &HashMap<String, Vec<f64>>) -> Result<(), Error> {
// ... regressor scaling logic ...
}
There's an issue with the standard deviation calculation for regressors:
let mean = vals.iter().sum::<f64>() / vals.len() as f64;
let variance = vals.iter().map(|x| (x - mean).powi(2)).sum::<f64>() / vals.len() as f64;
let std = variance.sqrt();
This calculation uses the population standard deviation formula (dividing by n) instead of the sample standard deviation formula (dividing by n-1). For most statistical applications, the sample standard deviation is more appropriate. Consider updating the calculation to:
let n = vals.len();
let mean = vals.iter().sum::<f64>() / n as f64;
let variance = vals.iter().map(|x| (x - mean).powi(2)).sum::<f64>() / (n - 1) as f64;
let std = variance.sqrt();
This change will provide a more accurate estimate of the standard deviation, especially for smaller sample sizes.
/// Get the changepoints for the model. | ||
/// | ||
/// Returns a tuple of (changepoints, changepoints_t) where | ||
/// `changepoints` is a vector of dates and `changepoints_t` is a | ||
/// vector of times. | ||
fn get_changepoints( | ||
&self, | ||
ds: &[TimestampSeconds], | ||
) -> Result<(Vec<TimestampSeconds>, Vec<f64>), Error> { | ||
let first_date = ds.first().ok_or(Error::NotEnoughData)?; | ||
let last_date = ds.last().ok_or(Error::NotEnoughData)?; | ||
let changepoints = if let Some(changepoints) = self.opts.changepoints.as_ref() { | ||
if !changepoints.is_empty() { | ||
let too_low = changepoints.iter().any(|x| x < first_date); | ||
let too_high = changepoints.iter().any(|x| x > last_date); | ||
if too_low || too_high { | ||
return Err(Error::ChangepointsOutOfRange); | ||
} | ||
} | ||
changepoints.clone() | ||
} else { | ||
let hist_size = (ds.len() as f64 * *self.opts.changepoint_range).floor() as usize; | ||
let mut n_changepoints = self.opts.n_changepoints as usize; | ||
if n_changepoints + 1 > hist_size { | ||
n_changepoints = hist_size - 1; | ||
} | ||
if n_changepoints > 0 { | ||
// Place changepoints evenly through the first `changepoint_range` percent of the history. | ||
// let step = ((hist_size - 1) as f64 / (n_changepoints + 1) as f64).round(); | ||
// let cp_indices = (0..(hist_size - 1)).step_by(step as usize); | ||
let cp_indices = (0..(n_changepoints + 1)).map(|i| { | ||
let num_steps = n_changepoints as f64; // note: don't add one since we don't include | ||
// the last point. | ||
(i as f64 / num_steps * (hist_size - 1) as f64).round() as usize | ||
}); | ||
cp_indices.map(|i| ds[i]).skip(1).collect() | ||
} else { | ||
vec![] | ||
} | ||
}; | ||
|
||
let scales = self | ||
.scales | ||
.as_ref() | ||
.expect("Scales not initialized when setting changepoints; this is a bug"); | ||
let changepoints_t = if changepoints.is_empty() { | ||
vec![0.0] // Dummy changepoint. | ||
} else { | ||
changepoints | ||
.iter() | ||
.map(|ds| (ds - scales.start) as f64 / scales.t_scale) | ||
.collect() | ||
}; | ||
Ok((changepoints, changepoints_t)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider improving changepoint distribution logic
The get_changepoints
method generally seems well-implemented, but there's a potential issue in the logic for automatically generating changepoints:
let cp_indices = (0..(n_changepoints + 1)).map(|i| {
let num_steps = n_changepoints as f64;
(i as f64 / num_steps * (hist_size - 1) as f64).round() as usize
});
cp_indices.map(|i| ds[i]).skip(1).collect()
This calculation might not distribute the changepoints evenly through the specified range of the history. The current implementation:
- Includes the start of the range (index 0) in the calculation.
- Skips the first calculated index.
- Doesn't guarantee that the last changepoint is at the end of the specified range.
Consider updating the logic to ensure even distribution:
let cp_indices = (1..=n_changepoints).map(|i| {
let fraction = i as f64 / (n_changepoints + 1) as f64;
(fraction * hist_size as f64).round() as usize - 1
});
cp_indices.map(|i| ds[i]).collect()
This change ensures that:
- Changepoints are evenly distributed between the start and end of the specified range.
- The first changepoint is not at the very beginning of the range.
- The last changepoint is close to, but not exactly at, the end of the range.
This adjustment aligns better with the intention of placing changepoints evenly through the first changepoint_range
percent of the history.
); | ||
} | ||
|
||
#[test] | ||
fn test_component_columns() { | ||
let components = [ | ||
(0, ComponentName::Seasonality("weekly".to_string())), | ||
(1, ComponentName::Seasonality("weekly".to_string())), | ||
(2, ComponentName::Seasonality("weekly".to_string())), | ||
(3, ComponentName::Seasonality("weekly".to_string())), | ||
(4, ComponentName::Seasonality("weekly".to_string())), | ||
(5, ComponentName::Seasonality("weekly".to_string())), | ||
(6, ComponentName::Holiday("birthday".to_string())), | ||
(7, ComponentName::Holiday("birthday".to_string())), | ||
(8, ComponentName::Regressor("binary_feature".to_string())), | ||
(9, ComponentName::Regressor("numeric_feature".to_string())), | ||
(10, ComponentName::Regressor("numeric_feature2".to_string())), | ||
(11, ComponentName::Regressor("binary_feature2".to_string())), | ||
(6, ComponentName::Holidays), | ||
(7, ComponentName::Holidays), | ||
(0, ComponentName::AdditiveTerms), | ||
(1, ComponentName::AdditiveTerms), | ||
(2, ComponentName::AdditiveTerms), | ||
(3, ComponentName::AdditiveTerms), | ||
(4, ComponentName::AdditiveTerms), | ||
(5, ComponentName::AdditiveTerms), | ||
(8, ComponentName::AdditiveTerms), | ||
(9, ComponentName::AdditiveTerms), | ||
(11, ComponentName::AdditiveTerms), | ||
(8, ComponentName::RegressorsAdditive), | ||
(9, ComponentName::RegressorsAdditive), | ||
(11, ComponentName::RegressorsAdditive), | ||
(6, ComponentName::MultiplicativeTerms), | ||
(7, ComponentName::MultiplicativeTerms), | ||
(10, ComponentName::MultiplicativeTerms), | ||
(10, ComponentName::RegressorsMultiplicative), | ||
]; | ||
let cols = ComponentColumns::new(&components); | ||
assert_eq!(cols.additive, vec![1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1]); | ||
assert_eq!( | ||
cols.multiplicative, | ||
vec![0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0] | ||
); | ||
assert_eq!(cols.all_holidays, vec![0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0]); | ||
assert_eq!( | ||
cols.regressors_additive, | ||
vec![0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1] | ||
); | ||
assert_eq!( | ||
cols.regressors_multiplicative, | ||
vec![0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0] | ||
); | ||
assert_eq!(cols.seasonalities.len(), 1); | ||
assert_eq!(cols.holidays.len(), 1); | ||
assert_eq!(cols.regressors.len(), 4); | ||
assert_eq!( | ||
cols.seasonalities["weekly"], | ||
&[1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0] | ||
); | ||
assert_eq!( | ||
cols.holidays["birthday"], | ||
&[0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0] | ||
); | ||
assert_eq!( | ||
cols.regressors["binary_feature"], | ||
&[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0] | ||
); | ||
assert_eq!( | ||
cols.regressors["numeric_feature"], | ||
&[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0] | ||
); | ||
assert_eq!( | ||
cols.regressors["numeric_feature2"], | ||
&[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0] | ||
); | ||
assert_eq!( | ||
cols.regressors["binary_feature2"], | ||
&[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1] | ||
); | ||
} | ||
|
||
#[test] | ||
fn constant_regressor() { | ||
let data = daily_univariate_ts(); | ||
let n = data.len(); | ||
let data = data | ||
.with_regressors( | ||
HashMap::from([("constant_feature".to_string(), vec![0.0; n])]) | ||
.into_iter() | ||
.collect(), | ||
) | ||
.unwrap(); | ||
let mut prophet = Prophet::new(ProphetOptions::default(), MockOptimizer::new()); | ||
prophet.add_regressor("constant_feature".to_string(), Regressor::additive()); | ||
prophet.fit(data, Default::default()).unwrap(); | ||
let reg_scales = &prophet.scales.unwrap().regressors["constant_feature"]; | ||
assert_approx_eq!(reg_scales.mu, 0.0); | ||
assert_approx_eq!(reg_scales.std, 1.0); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance test coverage and organization
The test module provides good coverage for various functionalities of the Prophet
implementation. However, there are some areas where the tests could be improved:
-
Test organization: Consider grouping related tests into nested modules for better organization. For example:
mod tests { mod dataframe_setup { // Tests related to dataframe setup } mod growth_models { // Tests for linear, logistic, and flat growth } mod seasonality { // Tests for seasonality-related functions } // ... other groupings ... }
-
Error cases: Add more tests for error cases to ensure robust error handling. For example, test the behavior when invalid data is provided.
-
Edge cases: Include tests for edge cases, such as empty input data, extreme values, or boundary conditions.
-
Parameterized tests: Use parameterized tests for functions that should behave similarly with different inputs. This can reduce code duplication in tests.
-
Mocking: Consider using mocks for external dependencies to isolate the units under test.
-
Benchmarks: Add benchmarks for performance-critical functions to track performance over time.
-
Documentation tests: Add documentation tests for public APIs to ensure examples in the documentation remain valid.
Example of a parameterized test:
use rstest::rstest;
#[rstest]
#[case(GrowthType::Linear, 1.0, 0.0)]
#[case(GrowthType::Flat, 0.0, 0.5)]
#[case(GrowthType::Logistic, 0.5, 0.5)]
fn test_growth_init(#[case] growth_type: GrowthType, #[case] expected_k: f64, #[case] expected_m: f64) {
let prophet = Prophet::new(ProphetOptions { growth: growth_type, ..Default::default() }, MockOptimizer::new());
let (k, m) = prophet.some_growth_init_function();
assert_approx_eq!(k, expected_k);
assert_approx_eq!(m, expected_m);
}
Implementing these suggestions would improve the overall quality and maintainability of the test suite.
This PR adds an implementation of the Prophet forecasting model. It handles:
The
Optimizer
trait (for which there are no implementations, yet) allows the model to work without any a dependency on any specific optimizer. The Python and R libraries have (in the past) offered bindings using Stan libraries and the cmdstan interface; the cmdstan is the only one now supported. A similar interface could be added here with relative ease.Summary by CodeRabbit
New Features
bacon.toml
configuration file for enhanced task management in Rust projects.augurs-prophet
,chrono
,num-traits
,rand
, andpretty_assertions
.Prophet
forecasting model with various configuration options and prediction capabilities.README.md
detailing theaugurs-prophet
package and its development requirements.FeatureMode
,Holiday
,Regressor
,Seasonality
, andStandardize
.Bug Fixes
rand
to use workspace specifications.Documentation
README.md
for theaugurs-prophet
package explaining its functionality and usage.Prophet
model.Tests
Prophet
model and its components.Chores
.gitignore
to exclude.bacon-locations
.watch
target in thejustfile
for monitoring project changes.