Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add power transformation logic to forecaster transforms #185

Merged
merged 34 commits into from
Dec 11, 2024

Conversation

edwardcqian
Copy link
Contributor

@edwardcqian edwardcqian commented Dec 5, 2024

  • add boxcox and inverse boxcox to transforms
  • add power_transforms logic which searches for the optimum lambda to use for boxcox

Summary by CodeRabbit

  • New Features

    • Introduced a new module for power transformations, including Box-Cox and Yeo-Johnson transformation functionalities.
    • Added functions for optimizing lambda parameters for Box-Cox and Yeo-Johnson transformations.
  • Improvements

    • Updated visibility of key types and methods to be publicly accessible.
    • Enhanced the Transforms struct with new transformation capabilities.
  • Tests

    • Included new tests to validate the Box-Cox and Yeo-Johnson transformations and their inverses.
    • Added tests for the Forecaster module using power transformations.

Copy link
Contributor

coderabbitai bot commented Dec 5, 2024

Walkthrough

The changes in this pull request involve modifications to the Cargo.toml file and the addition of new functionalities in the augurs-forecaster package. A new dependency argmin was added, and the augurs dependency was updated with specific features. Additionally, a new module power_transforms was introduced, which includes functions for optimizing the Box-Cox and Yeo-Johnson transformations. The visibility of several types and functions was updated to public, enhancing accessibility. The Transforms struct and Transform enum were also modified to support the new transformation functionalities.

Changes

File Path Change Summary
crates/augurs-forecaster/Cargo.toml - Added dependency: argmin = "0.10.0"
- Updated augurs dependency to include features: ["mstl", "ets", "forecaster"]
crates/augurs-forecaster/src/lib.rs - Added new module: power_transforms
- Updated visibility of Transforms type from pub(crate) to pub
crates/augurs-forecaster/src/power_transforms.rs - Added functions: box_cox_log_likelihood, yeo_johnson_log_likelihood, optimize_box_cox_lambda, optimize_yeo_johnson_lambda
- Added structs: BoxCoxProblem, YeoJohnsonProblem
- Implemented CostFunction for both structs
- Included error handling and validation for transformations
crates/augurs-forecaster/src/transforms.rs - Added new enum variants: BoxCox, YeoJohnson
- Added methods: yeo_johnson, boxcox, power_transform
- Added functions: box_cox, yeo_johnson
- Added iterator adapters: BoxCox<T>, InverseBoxCox<T>, YeoJohnson<T>, InverseYeoJohnson<T>
crates/augurs-forecaster/src/forecaster.rs - Added new test functions: test_forecaster_power_positive, test_forecaster_power_non_positive to validate power transformation functionality in Forecaster

Possibly related PRs

  • chore: add benchmark for Prophet #140: This PR adds benchmarks for the Prophet model, which is directly related to the changes made in the main PR that introduced new functionalities and optimizations in the Prophet model.
  • feat: add Forecaster wrapper for Prophet #191: This PR introduces a wrapper for the Prophet model, enhancing its integration with the existing forecasting system, which aligns with the modifications made in the main PR regarding the Prophet struct and its functionalities.

🐇 In the code, I hop and play,
New features bloom, bright as day.
With Box-Cox and Yeo-Johnson's might,
Transforming data, oh what a sight!
Dependencies added, visibility bright,
In the world of Rust, we take flight! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)
crates/augurs-forecaster/src/power_transforms.rs (1)

49-62: Expand test coverage to include edge cases and invalid inputs

The current tests cover standard scenarios but do not account for edge cases such as non-positive data values or extreme lambda values. Testing these cases ensures the robustness of the functions.

Consider adding test cases for:

  • Data containing zero or negative values.
  • Empty or very large datasets.
  • Extreme lambda values, including negative values and zero.

This will help identify potential issues with input validation and function stability.

crates/augurs-forecaster/src/transforms.rs (2)

15-15: Review the necessity of making Transforms publicly accessible

Changing the visibility of the Transforms struct to pub exposes its internal structure to external users. Ensure that this is intentional and doesn't compromise encapsulation or future flexibility.

If only certain functionalities need to be public, consider exposing them through public methods while keeping the struct fields private.


605-621: Add test cases for edge conditions in Box-Cox transformations

The tests currently cover basic functionality. Including edge cases enhances test coverage and reliability.

Suggested test cases:

  • Very small x values approaching zero.
  • Negative lambda values.
  • lambda equal to zero.
  • Non-positive x values to ensure error handling works as expected.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 84399ed and 0bcedb4.

📒 Files selected for processing (4)
  • crates/augurs-forecaster/Cargo.toml (1 hunks)
  • crates/augurs-forecaster/src/lib.rs (1 hunks)
  • crates/augurs-forecaster/src/power_transforms.rs (1 hunks)
  • crates/augurs-forecaster/src/transforms.rs (8 hunks)
🔇 Additional comments (5)
crates/augurs-forecaster/src/transforms.rs (1)

94-99: ⚠️ Potential issue

Validate input values in the Box-Cox transformation

The box_cox function assumes x > 0, but this is not enforced. Passing non-positive x values will result in math errors.

Add input validation to ensure x is positive:

 pub fn box_cox(x: f64, lambda: f64) -> f64 {
+    if x <= 0.0 {
+        panic!("Input x must be positive for Box-Cox transformation.");
+    }
     if lambda == 0.0 {
         x.ln()
     } else {
         (x.powf(lambda) - 1.0) / lambda
     }
 }

Alternatively, handle the error by returning a Result<f64, Error> or skipping invalid values.

Likely invalid or redundant comment.

crates/augurs-forecaster/src/lib.rs (2)

6-6: Verify integration of the power_transforms module

The addition of the power_transforms module enhances functionality. Ensure all components are correctly integrated and documented.


13-14: Confirm the public exposure of Transforms and optimize_lambda

Making Transforms and optimize_lambda public increases their accessibility. Verify that this aligns with the intended API design and that appropriate documentation is provided for external users.

crates/augurs-forecaster/Cargo.toml (2)

19-19: Include the argmin crate for optimization capabilities

Adding argmin = "0.10.0" as a dependency provides necessary optimization routines for the Box-Cox transformation logic.


22-22: Specify features for the augurs dependency

Updating augurs to include specific features ensures only required functionalities are compiled, potentially improving build times and reducing binary size.

crates/augurs-forecaster/src/power_transforms.rs Outdated Show resolved Hide resolved
crates/augurs-forecaster/src/power_transforms.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
crates/augurs-forecaster/src/power_transforms.rs (2)

38-40: Avoid unnecessary data copying by borrowing the data slice

In the optimize_lambda function, you're cloning the input data with data.to_vec(), which introduces unnecessary memory overhead. Consider borrowing the data slice in BoxCoxProblem to improve performance and reduce memory usage.

Modify BoxCoxProblem to borrow the data:

-pub struct BoxCoxProblem {
-    data: Vec<f64>,
+pub struct BoxCoxProblem<'a> {
+    data: &'a [f64],
 }

-impl CostFunction for BoxCoxProblem {
+impl<'a> CostFunction for BoxCoxProblem<'a> {
     // ...
 }

Update the optimize_lambda function accordingly:

 pub fn optimize_lambda(data: &[f64]) -> f64 {
     let cost = BoxCoxProblem {
-        data: data.to_vec(),
+        data,
     };
     // Rest of the function...
 }

37-49: Add documentation comments to public functions

The public function optimize_lambda lacks documentation. Adding /// comments explaining the purpose, parameters, and return value of the function will improve code readability and maintainability.

Consider adding documentation:

/// Optimize the lambda parameter for the Box-Cox transformation.
///
/// # Arguments
///
/// * `data` - A slice of positive real numbers.
///
/// # Returns
///
/// * The optimal lambda value that maximizes the log-likelihood function.
pub fn optimize_lambda(data: &[f64]) -> f64 {
    // Function implementation...
}
crates/augurs-forecaster/src/transforms.rs (1)

609-624: Enhance test coverage for edge cases

The test cases boxcox_test and inverse_boxcox_test currently use positive input values. Consider adding tests with zero and negative inputs to verify that the functions handle invalid inputs appropriately.

For example, add a test case for zero input:

#[test]
#[should_panic(expected = "Input x must be positive for Box-Cox transformation.")]
fn boxcox_zero_input() {
    let data = vec![0.0];
    let lambda = 0.5;
    let _ = data.into_iter().boxcox(lambda).collect::<Vec<_>>();
}

And a test case for negative input:

#[test]
#[should_panic(expected = "Input x must be positive for Box-Cox transformation.")]
fn boxcox_negative_input() {
    let data = vec![-1.0];
    let lambda = 0.5;
    let _ = data.into_iter().boxcox(lambda).collect::<Vec<_>>();
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0bcedb4 and 4408e2d.

📒 Files selected for processing (3)
  • crates/augurs-forecaster/src/lib.rs (1 hunks)
  • crates/augurs-forecaster/src/power_transforms.rs (1 hunks)
  • crates/augurs-forecaster/src/transforms.rs (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/augurs-forecaster/src/lib.rs
🔇 Additional comments (3)
crates/augurs-forecaster/src/power_transforms.rs (2)

5-19: Add input validation for data containing non-positive values

The function box_cox_log_likelihood assumes that all elements in data are positive because it uses logarithmic functions (x.ln()) which are undefined for non-positive values. If data contains zero or negative values, this will lead to runtime errors or invalid calculations.


44-48: Handle potential errors from the optimizer to prevent panics

Unwrapping the result of run() and best_param can lead to a panic if the optimization fails. It's safer to handle possible errors explicitly to prevent unexpected crashes.

crates/augurs-forecaster/src/transforms.rs (1)

436-443: Ensure valid computations in the inverse Box-Cox transformation

The inverse_box_cox function may encounter invalid operations if y * lambda + 1.0 is negative, especially when lambda < 0 and y has certain values. This can lead to undefined behavior or runtime errors.

crates/augurs-forecaster/src/transforms.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
crates/augurs-forecaster/src/transforms.rs (1)

611-627: Enhance test coverage for Box-Cox transformations

Consider adding tests for:

  1. Edge case: lambda = 0 (logarithmic case)
  2. Error cases: negative inputs for Box-Cox and invalid inputs for inverse Box-Cox

Example test cases to add:

#[test]
fn boxcox_edge_cases() {
    // Test lambda = 0 case (logarithmic)
    let data = vec![1.0, 2.0, 3.0];
    let lambda = 0.0;
    let expected = vec![0.0, 2.0_f64.ln(), 3.0_f64.ln()];
    let actual: Vec<_> = data.into_iter().boxcox(lambda).collect();
    assert_all_close(&expected, &actual);
}

#[test]
#[should_panic(expected = "Input x must be positive")]
fn boxcox_negative_input() {
    let data = vec![-1.0];
    let lambda = 0.5;
    let _: Vec<_> = data.into_iter().boxcox(lambda).collect();
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4408e2d and 834cf61.

📒 Files selected for processing (1)
  • crates/augurs-forecaster/src/transforms.rs (8 hunks)
🔇 Additional comments (5)
crates/augurs-forecaster/src/transforms.rs (5)

10-15: LGTM! Clear documentation and appropriate visibility.

The documentation clearly explains the purpose of the Transforms struct and its relationship with the Transform enum. The visibility change to pub is appropriate for the module's public API.


56-57: LGTM! Clean implementation of Box-Cox transform variant.

The BoxCox variant and its constructor method follow Rust's idiomatic patterns and are well-documented.

Also applies to: 93-98


Line range hint 100-111: LGTM! Consistent implementation of transform methods.

The Box-Cox transform handling in both transform and inverse_transform methods follows the established pattern of other transformations.

Also applies to: 114-125


408-478: LGTM! Well-structured iterator adapters.

The Box-Cox iterator adapters follow Rust's iterator patterns consistently and are well-documented.


439-446: ⚠️ Potential issue

Add validation for inverse Box-Cox transformation

The inverse_box_cox function needs validation to prevent invalid operations when y * lambda + 1.0 is negative.

Apply this fix:

 fn inverse_box_cox(y: f64, lambda: f64) -> f64 {
+    let value = y * lambda + 1.0;
+    if value <= 0.0 {
+        panic!("Invalid value encountered in inverse Box-Cox transformation.");
+    }
     if lambda == 0.0 {
         y.exp()
     } else {
-        (y * lambda + 1.0).powf(1.0 / lambda)
+        value.powf(1.0 / lambda)
     }
 }

Copy link
Collaborator

@sd2k sd2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, would be very cool to see this in! Good excuse to finish off #184 too.

I've added loads of inline comments but this is awesome for your first Rust contribution 🥳

crates/augurs-forecaster/Cargo.toml Outdated Show resolved Hide resolved
crates/augurs-forecaster/src/power_transforms.rs Outdated Show resolved Hide resolved
crates/augurs-forecaster/src/power_transforms.rs Outdated Show resolved Hide resolved
crates/augurs-forecaster/src/power_transforms.rs Outdated Show resolved Hide resolved
crates/augurs-forecaster/src/transforms.rs Outdated Show resolved Hide resolved
crates/augurs-forecaster/src/transforms.rs Outdated Show resolved Hide resolved
crates/augurs-forecaster/src/transforms.rs Outdated Show resolved Hide resolved
crates/augurs-forecaster/src/transforms.rs Outdated Show resolved Hide resolved
crates/augurs-forecaster/src/transforms.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)
crates/augurs-forecaster/src/power_transforms.rs (1)

55-74: Enhance test coverage with edge cases

The current tests only cover happy path scenarios. Consider adding tests for:

  • Empty data array
  • Data containing zero or negative values
  • Data with identical values (zero variance)
  • Optimization failure cases

Example additional test:

#[test]
fn test_edge_cases() {
    // Empty data
    let data: &[f64] = &[];
    assert!(optimize_lambda(data).is_err());

    // Negative values
    let data = &[-0.1, 0.2, 0.3];
    let llf = box_cox_log_likelihood(data, 1.0);
    assert_eq!(llf, f64::NEG_INFINITY);

    // Identical values (zero variance)
    let data = &[0.5, 0.5, 0.5];
    let llf = box_cox_log_likelihood(data, 1.0);
    assert_eq!(llf, f64::NEG_INFINITY);
}
crates/augurs-forecaster/src/transforms.rs (4)

58-64: Convert BoxCox to a struct variant for better clarity.

Following the suggestion from the previous review, convert the BoxCox variant to a struct for better documentation and clarity of the lambda parameter.

-    BoxCox{
-        /// The lambda parameter for the Box-Cox transformation.
-        /// If lambda == 0, the transformation is equivalent to the natural logarithm.
-        /// Otherwise, the transformation is (x^lambda - 1) / lambda.
-        lambda: f64,
-    },
+    /// Box-Cox transformation with a configurable lambda parameter.
+    BoxCox(BoxCoxParams),

+ /// Parameters for the Box-Cox transformation.
+ #[derive(Debug, Clone)]
+ pub struct BoxCoxParams {
+     /// The lambda parameter for the Box-Cox transformation.
+     /// If lambda == 0, the transformation is equivalent to the natural logarithm.
+     /// Otherwise, the transformation is (x^lambda - 1) / lambda.
+     pub lambda: f64,
+ }

110-118: Improve documentation for power_transform method.

The documentation for power_transform is incomplete. Add details about:

  1. How lambda is optimized
  2. The constraints on input data
  3. The relationship between power transform and Box-Cox transform
     /// Create the power transform that optimizes the lambda parameter for the Box-Cox transformation.
     /// 
     /// This transform applies the Power transformation to each item.
-    /// The Power transformation is defined as:
+    /// The Power transformation is a Box-Cox transformation where lambda is optimized to
+    /// maximize the log-likelihood of the transformed data being normally distributed.
+    /// 
+    /// # Arguments
+    /// * `data` - The input data used to optimize lambda. Must be positive values.
+    /// 
+    /// # Returns
+    /// A Box-Cox transformation with the optimized lambda parameter.
     ///
     pub fn power_transform(data: &[f64]) -> Self {

417-424: Consider using Result instead of assert for input validation.

Replace the assertion with a Result return type for more graceful error handling.

-pub fn box_cox(x: f64, lambda: f64) -> f64 {
-    assert!(x > 0.0, "Input x must be positive for Box-Cox transformation.");
+pub fn box_cox(x: f64, lambda: f64) -> Result<f64, &'static str> {
+    if x <= 0.0 {
+        return Err("Input x must be positive for Box-Cox transformation");
+    }
     if lambda == 0.0 {
-        x.ln()
+        Ok(x.ln())
     } else {
-        (x.powf(lambda) - 1.0) / lambda
+        Ok((x.powf(lambda) - 1.0) / lambda)
     }
 }

Based on the verification results, I can now provide a final response:

Box-Cox transformation implementation needs additional error handling

The Box-Cox transformation has been properly integrated into the Transform enum and its match patterns, but there are some concerns:

  • The box_cox function only asserts for positive input values, but this error handling isn't propagated through the iterator adaptors
  • No test cases were found specifically for Box-Cox transformation error cases
  • The power_transform function that optimizes lambda doesn't include input validation

Suggested improvements:

  • Add proper error handling in BoxCox and InverseBoxCox iterator implementations
  • Add test cases for invalid inputs (non-positive values)
  • Add input validation in power_transform function
🔗 Analysis chain

Line range hint 1-645: Verify usage of Box-Cox transformation in the codebase.

Let's ensure all callers are properly updated to handle the new Box-Cox transformation.


Let me generate another verification script to check for the actual usage patterns and error handling in the codebase.


Let me try one more verification script to check the match patterns and error handling in the correct file path:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all usages of BoxCox transformation and direct calls to box_cox function

# Find all uses of BoxCox variant in Transform enum
echo "Checking BoxCox variant usage:"
rg "Transform::BoxCox" -A 2

# Find direct calls to box_cox function
echo "Checking direct box_cox function calls:"
rg "box_cox\(" -A 2

# Find potential places where error handling might be needed
echo "Checking for potential error handling sites:"
ast-grep --pattern 'match $expr {
  $$$
  Transform::BoxCox { $$ } => $$,
  $$$
}'

Length of output: 1657


Script:

#!/bin/bash
# Check for Transform enum match patterns and BoxCox usage in tests/examples

# Find all match expressions on Transform enum
echo "Checking Transform enum match patterns:"
rg "match.*self.*\{" -A 10 "transforms.rs"

# Find test cases or examples using BoxCox
echo "Checking test cases and examples:"
rg "BoxCox" --type rust -A 5

# Check for any error handling patterns around box_cox
echo "Checking error handling patterns:"
rg "assert!.*box_cox" --type rust

Length of output: 7679


Script:

#!/bin/bash
# Check for Transform enum match patterns and error handling

# Find all match expressions on Transform enum with correct path
echo "Checking Transform enum match patterns:"
rg "match.*self.*\{" -A 10 "crates/augurs-forecaster/src/transforms.rs"

# Check for any panic or error handling around BoxCox
echo "Checking error handling:"
rg "assert!|panic!|Error" -A 2 "crates/augurs-forecaster/src/transforms.rs"

# Check for any tests related to BoxCox transformation
echo "Checking BoxCox tests:"
rg "#\[test\].*box" -A 10 "crates/augurs-forecaster/src/transforms.rs"

Length of output: 1610

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 834cf61 and 5f7f8cc.

📒 Files selected for processing (3)
  • crates/augurs-forecaster/Cargo.toml (1 hunks)
  • crates/augurs-forecaster/src/power_transforms.rs (1 hunks)
  • crates/augurs-forecaster/src/transforms.rs (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/augurs-forecaster/Cargo.toml
🔇 Additional comments (4)
crates/augurs-forecaster/src/power_transforms.rs (3)

5-20: ⚠️ Potential issue

Add input validation to prevent mathematical domain errors

The Box-Cox transformation requires strictly positive values, and the log-likelihood calculation has potential mathematical domain errors.

Apply these changes to handle edge cases:

 fn box_cox_log_likelihood(data: &[f64], lambda: f64) -> f64 {
     let n = data.len() as f64;
     assert!(n > 0.0, "Data must not be empty");
+    // Validate positive values
+    if data.iter().any(|&x| x <= 0.0) {
+        return f64::NEG_INFINITY;  // Invalid transformation
+    }
     let transformed_data: Vec<f64> = data.iter().map(|&x| box_cox(x, lambda)).collect();
     let mean_transformed: f64 = transformed_data.iter().copied().sum::<f64>() / n;
     let variance: f64 = transformed_data
         .iter()
         .map(|&x| (x - mean_transformed).powi(2))
         .sum::<f64>()
         / n;

+    // Handle zero variance case
+    if variance <= 0.0 {
+        return f64::NEG_INFINITY;  // All values are identical
+    }

     let log_likelihood =
         -0.5 * n * variance.ln() + (lambda - 1.0) * data.iter().map(|&x| x.ln()).sum::<f64>();
     log_likelihood
 }

22-35: 🛠️ Refactor suggestion

Optimize memory usage by using references

The struct can be more memory efficient by using a reference to the data instead of cloning it.

Apply this change:

 #[derive(Clone)]
-struct BoxCoxProblem {
-    data: Vec<f64>,
+struct BoxCoxProblem<'a> {
+    data: &'a [f64],
 }

-impl CostFunction for BoxCoxProblem {
+impl<'a> CostFunction for BoxCoxProblem<'a> {
     type Param = f64;
     type Output = f64;

     fn cost(&self, lambda: &Self::Param) -> Result<Self::Output, Error> {
         Ok(-box_cox_log_likelihood(self.data, *lambda))
     }
 }

37-53: ⚠️ Potential issue

Replace panic with proper error handling

The function should return a Result instead of panicking on optimization failure.

Apply these changes:

-pub fn optimize_lambda(data: &[f64]) -> f64 {
+pub fn optimize_lambda(data: &[f64]) -> Result<f64, Error> {
     let cost = BoxCoxProblem {
-        data: data.to_vec(),
+        data,
     };
     let init_param = 0.5;
     let solver = BrentOpt::new(-2.0, 2.0);

-    let result = Executor::new(cost, solver)
+    let result = Executor::new(cost, solver)
         .configure(|state| state.param(init_param).max_iters(100))
         .run();

-    match result {
-        Ok(result) => result.state().best_param.unwrap(),
-        Err(error) => panic!("Optimization failed: {}", error),
-    }
+    result.and_then(|res| {
+        res.state()
+            .best_param
+            .ok_or_else(|| Error::new(argmin::core::ErrorKind::NotConverged, "Failed to find optimal parameter"))
+    })
 }
crates/augurs-forecaster/src/transforms.rs (1)

458-464: ⚠️ Potential issue

Add input validation to inverse_box_cox.

The function needs to validate that y * lambda + 1.0 is positive when lambda != 0.0 to prevent invalid operations.

-fn inverse_box_cox(y: f64, lambda: f64) -> f64 {
+fn inverse_box_cox(y: f64, lambda: f64) -> Result<f64, &'static str> {
     if lambda == 0.0 {
-        y.exp()
+        Ok(y.exp())
     } else {
-        (y * lambda + 1.0).powf(1.0 / lambda)
+        let value = y * lambda + 1.0;
+        if value <= 0.0 {
+            return Err("Invalid value encountered in inverse Box-Cox transformation");
+        }
+        Ok(value.powf(1.0 / lambda))
     }
 }

Likely invalid or redundant comment.

crates/augurs-forecaster/src/transforms.rs Outdated Show resolved Hide resolved
crates/augurs-forecaster/src/transforms.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
crates/augurs-forecaster/src/power_transforms.rs (3)

37-38: Improve function documentation

Add documentation explaining the meaning of the returned lambda value and its valid range.

 /// Optimize the lambda parameter for the Box-Cox transformation
+/// 
+/// Returns the optimal lambda value that maximizes the log-likelihood of the Box-Cox transformation.
+/// The lambda value typically falls between -2 and 2, where:
+/// * λ = 1.0: No transformation needed
+/// * λ = 0.0: Log transformation
+/// * λ = 0.5: Square root transformation
+/// * λ = -1.0: Reciprocal transformation
 pub fn optimize_lambda(data: &[f64]) -> f64{

42-47: Consider making optimization parameters configurable

The optimization bounds (-2.0, 2.0) and maximum iterations (100) are currently hard-coded. Consider making these configurable for more flexibility.

+#[derive(Debug, Clone)]
+pub struct OptimizationConfig {
+    pub lower_bound: f64,
+    pub upper_bound: f64,
+    pub max_iterations: u64,
+    pub initial_guess: f64,
+}
+
+impl Default for OptimizationConfig {
+    fn default() -> Self {
+        Self {
+            lower_bound: -2.0,
+            upper_bound: 2.0,
+            max_iterations: 100,
+            initial_guess: 0.5,
+        }
+    }
+}
+
-pub fn optimize_lambda(data: &[f64]) -> Result<f64, Error> {
+pub fn optimize_lambda(data: &[f64], config: Option<OptimizationConfig>) -> Result<f64, Error> {
+    let config = config.unwrap_or_default();
     let cost = BoxCoxProblem {
         data: data,
     };
-    let init_param = 0.5;
-    let solver = BrentOpt::new(-2.0, 2.0);
+    let solver = BrentOpt::new(config.lower_bound, config.upper_bound);
 
     let result = Executor::new(cost, solver)
-        .configure(|state| state.param(init_param).max_iters(100))
+        .configure(|state| state.param(config.initial_guess).max_iters(config.max_iterations))

55-74: Add more comprehensive test cases

The current tests only cover happy paths. Consider adding tests for:

  1. Edge cases (empty data, single value)
  2. Error cases (negative values, zero values)
  3. Boundary conditions (very small/large values)
  4. Known transformations (log transform λ=0, no transform λ=1)
#[test]
fn test_edge_cases() {
    // Single value should return lambda=1 (no transformation needed)
    let data = &[1.0];
    assert_approx_eq!(optimize_lambda(data).unwrap(), 1.0);
}

#[test]
fn test_known_transformations() {
    // Data that should prefer log transformation (λ≈0)
    let data = &[1.0, 10.0, 100.0, 1000.0];
    assert_approx_eq!(optimize_lambda(data).unwrap(), 0.0, 0.1);
}

#[test]
fn test_error_cases() {
    // Negative values should return error
    let data = &[-1.0, 1.0, 2.0];
    assert!(optimize_lambda(data).is_err());

    // Zero values should return error
    let data = &[0.0, 1.0, 2.0];
    assert!(optimize_lambda(data).is_err());
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 5f7f8cc and c89d42e.

📒 Files selected for processing (1)
  • crates/augurs-forecaster/src/power_transforms.rs (1 hunks)
🔇 Additional comments (2)
crates/augurs-forecaster/src/power_transforms.rs (2)

1-4: LGTM!

The imports are appropriate for implementing Box-Cox transformation optimization.


5-20: ⚠️ Potential issue

Add comprehensive input validation and error handling

The log-likelihood calculation needs more robust error handling:

  1. Input data must be validated for positive values (Box-Cox requires x > 0)
  2. Variance calculation should be protected against zero/negative values
  3. NaN/infinite values should be handled

Consider implementing these safeguards:

-fn box_cox_log_likelihood(data: &[f64], lambda: f64) -> f64 {
+fn box_cox_log_likelihood(data: &[f64], lambda: f64) -> Result<f64, &'static str> {
     let n = data.len() as f64;
     assert!(n > 0.0, "Data must not be empty");
+    
+    // Validate input data
+    if data.iter().any(|&x| x <= 0.0) {
+        return Err("All data values must be positive for Box-Cox transformation");
+    }
+    if data.iter().any(|&x| x.is_nan() || x.is_infinite()) {
+        return Err("Data contains NaN or infinite values");
+    }
+
     let transformed_data: Vec<f64> = data.iter().map(|&x| box_cox(x, lambda)).collect();
     let mean_transformed: f64 = transformed_data.iter().copied().sum::<f64>() / n;
     let variance: f64 = transformed_data
         .iter()
         .map(|&x| (x - mean_transformed).powi(2))
         .sum::<f64>()
         / n;

+    // Ensure variance is valid
+    if variance <= 0.0 {
+        return Err("Variance must be positive");
+    }
+
     // Avoid log(0) by ensuring variance is positive
     let log_likelihood =
         -0.5 * n * variance.ln() + (lambda - 1.0) * data.iter().map(|&x| x.ln()).sum::<f64>();
-    log_likelihood
+    Ok(log_likelihood)
 }

Likely invalid or redundant comment.

crates/augurs-forecaster/src/power_transforms.rs Outdated Show resolved Hide resolved
crates/augurs-forecaster/src/power_transforms.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c89d42e and ebfd3c7.

📒 Files selected for processing (3)
  • crates/augurs-forecaster/src/lib.rs (1 hunks)
  • crates/augurs-forecaster/src/power_transforms.rs (1 hunks)
  • crates/augurs-forecaster/src/transforms.rs (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/augurs-forecaster/src/lib.rs
  • crates/augurs-forecaster/src/power_transforms.rs
🔇 Additional comments (9)
crates/augurs-forecaster/src/transforms.rs (9)

10-11: Ensure optimize_lambda is utilized correctly

The import of optimize_lambda suggests you are optimizing the lambda parameter for the Box-Cox transformation. Verify that this function is correctly implemented and used in the code.


12-15: Documentation enhancements are clear and informative

The added documentation for Transforms and Transform improves clarity on their roles and usage in the codebase.


58-64: Addition of BoxCox variant to Transform enum

The introduction of the BoxCox variant with detailed documentation properly extends the Transform enum to support Box-Cox transformations.


106-108: Method boxcox correctly creates a Box-Cox transform

The boxcox method provides a straightforward way to create a new Box-Cox transform with a specified lambda parameter.


121-121: Consider restricting the visibility of transform method

To reduce the public API surface and encapsulate implementation details, consider changing the visibility of the transform method from pub to pub(crate) if it is not intended for external use.


135-135: Consider restricting the visibility of inverse_transform method

Similar to the transform method, assess whether inverse_transform needs to be publicly accessible. If not, changing it to pub(crate) can enhance encapsulation.


149-149: Consider restricting the visibility of inverse_transform_forecast method

Evaluate if inverse_transform_forecast should be part of the public API. If it is used internally, consider changing its visibility to pub(crate) to maintain a cleaner public interface.


632-648: Expand test coverage for Box-Cox transformation

The current tests cover basic scenarios. To enhance robustness, consider adding:

  1. Edge case where lambda = 0: Test the logarithmic case of the Box-Cox transformation.
  2. Input validation tests: Ensure that the transformation functions handle non-positive inputs appropriately, either by panicking or returning an error.
  3. Inverse transformation accuracy: Add tests to verify that inverse_box_cox(box_cox(x, lambda), lambda) approximates x within an acceptable tolerance for various values of x and lambda.
  4. Negative lambda values: Include tests where lambda is negative to ensure correct behavior in these cases.

461-467: ⚠️ Potential issue

Ensure valid computations in the inverse Box-Cox transformation

The inverse_box_cox function may produce invalid results if y * lambda + 1.0 is non-positive when lambda < 0 and y has certain values. This results in attempting to calculate the root of a non-positive number, leading to a runtime error.

Add input validation to prevent invalid computations:

fn inverse_box_cox(y: f64, lambda: f64) -> f64 {
+    let value = y * lambda + 1.0;
+    if value <= 0.0 {
+        panic!("Invalid value encountered in inverse Box-Cox transformation: y * lambda + 1.0 must be positive.");
+    }
    if lambda == 0.0 {
        y.exp()
    } else {
-        (y * lambda + 1.0).powf(1.0 / lambda)
+        value.powf(1.0 / lambda)
    }
}

Alternatively, handle errors gracefully by returning a Result<f64, Error> instead of panicking.

crates/augurs-forecaster/src/transforms.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
crates/augurs-forecaster/src/transforms.rs (2)

58-64: Consider using a newtype pattern for lambda

The lambda parameter is a critical value that should be constrained to valid ranges. Consider wrapping it in a newtype with validation.

+#[derive(Debug, Clone, Copy)]
+pub struct BoxCoxLambda(f64);
+
+impl BoxCoxLambda {
+    pub fn new(lambda: f64) -> Self {
+        Self(lambda)
+    }
+
+    pub fn value(&self) -> f64 {
+        self.0
+    }
+}

 pub enum Transform {
-    BoxCox {
-        lambda: f64,
-    },
+    BoxCox(BoxCoxLambda),
 }

429-458: Consider implementing custom Debug for iterator adapters

The derived Debug implementation might not be ideal for the iterator adapters. Consider implementing a custom Debug that shows the lambda value.

-#[derive(Clone, Debug)]
+#[derive(Clone)]
 struct BoxCox<T> {
     inner: T,
     lambda: f64,
 }

+impl<T: fmt::Debug> fmt::Debug for BoxCox<T> {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        f.debug_struct("BoxCox")
+            .field("inner", &self.inner)
+            .field("lambda", &self.lambda)
+            .finish()
+    }
+}

Also applies to: 469-498

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ebfd3c7 and 214bee9.

📒 Files selected for processing (2)
  • crates/augurs-forecaster/src/power_transforms.rs (1 hunks)
  • crates/augurs-forecaster/src/transforms.rs (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/augurs-forecaster/src/power_transforms.rs
🔇 Additional comments (3)
crates/augurs-forecaster/src/transforms.rs (3)

10-11: LGTM: Import and dependency addition

The addition of optimize_lambda from the power transforms module is appropriate for the Box-Cox transformation implementation.


110-118: Add input validation in power_transform

The function should validate that all input data points are positive before attempting the transformation.


460-467: Add validation for inverse Box-Cox transformation

The inverse transformation should validate that y * lambda + 1.0 is positive when lambda != 0.

crates/augurs-forecaster/src/transforms.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
crates/augurs-forecaster/src/power_transforms.rs (3)

5-28: Consider improving numerical stability and performance

The implementation could benefit from the following improvements:

  1. Use a two-pass algorithm or Welford's online algorithm for variance calculation to improve numerical stability
  2. Use itertools' sum() for more efficient summation operations
 fn box_cox_log_likelihood(data: &[f64], lambda: f64) -> Result<f64, Error> {
     let n = data.len() as f64;
     if n == 0.0 {
         return Err(Error::msg("Data must not be empty"));
     }
     if data.iter().any(|&x| x <= 0.0) {
         return Err(Error::msg("All data must be greater than 0"));
     }
     let transformed_data: Vec<f64> = data.iter().map(|&x| box_cox(x, lambda)).collect();
-    let mean_transformed: f64 = transformed_data.iter().copied().sum::<f64>() / n;
-    let variance: f64 = transformed_data
-        .iter()
-        .map(|&x| (x - mean_transformed).powi(2))
-        .sum::<f64>()
-        / n;
+    // Use Welford's online algorithm for better numerical stability
+    let (mean_transformed, variance) = transformed_data.iter().fold(
+        (0.0, 0.0),
+        |(m, s2), &x| {
+            let delta = x - m;
+            let m_next = m + delta / n;
+            let s2_next = s2 + delta * (x - m_next);
+            (m_next, s2_next)
+        },
+    );
+    let variance = variance / n;

     // Avoid log(0) by ensuring variance is positive
     if variance <= 0.0 {
         return Err(Error::msg("Variance must be positive"));
     }
     let log_likelihood =
-        -0.5 * n * variance.ln() + (lambda - 1.0) * data.iter().map(|&x| x.ln()).sum::<f64>();
+        -0.5 * n * variance.ln() + (lambda - 1.0) * data.iter().map(|&x| x.ln()).sum();
     Ok(log_likelihood)
 }

45-61: Consider making optimization parameters configurable

The hardcoded optimization parameters (-2.0, 2.0 bounds, 0.5 initial value, 100 max iterations) might not be optimal for all datasets. Consider making these configurable through a struct.

+#[derive(Debug, Clone)]
+pub struct BoxCoxOptimizeParams {
+    pub lower_bound: f64,
+    pub upper_bound: f64,
+    pub init_param: f64,
+    pub max_iters: u64,
+}
+
+impl Default for BoxCoxOptimizeParams {
+    fn default() -> Self {
+        Self {
+            lower_bound: -2.0,
+            upper_bound: 2.0,
+            init_param: 0.5,
+            max_iters: 100,
+        }
+    }
+}

-pub(crate) fn optimize_lambda(data: &[f64]) -> Result<f64, Error> {
+pub(crate) fn optimize_lambda(data: &[f64], params: Option<BoxCoxOptimizeParams>) -> Result<f64, Error> {
+    let params = params.unwrap_or_default();
     let cost = BoxCoxProblem { data };
-    let init_param = 0.5;
-    let solver = BrentOpt::new(-2.0, 2.0);
+    let solver = BrentOpt::new(params.lower_bound, params.upper_bound);

     let result = Executor::new(cost, solver)
-        .configure(|state| state.param(init_param).max_iters(100))
+        .configure(|state| state.param(params.init_param).max_iters(params.max_iters))
         .run();

63-108: Consider adding more test cases

The current test coverage is good, but consider adding these edge cases:

  1. Data with very large values to test numerical stability
  2. Data with very small values (close to zero)
  3. Data that results in different optimal lambda values
  4. Test with max iterations reached
#[test]
fn test_boxcox_llf_large_values() {
    let data = &[1e10, 2e10, 3e10, 4e10, 5e10];
    let lambda = 1.0;
    let got = box_cox_log_likelihood(data, lambda);
    assert!(got.is_ok());
}

#[test]
fn test_boxcox_llf_small_values() {
    let data = &[1e-10, 2e-10, 3e-10, 4e-10, 5e-10];
    let lambda = 1.0;
    let got = box_cox_log_likelihood(data, lambda);
    assert!(got.is_ok());
}
crates/augurs-forecaster/src/forecaster.rs (1)

128-139: Improve test readability and coverage

The test could be improved in the following ways:

  1. Document why these specific expected values are correct
  2. Test additional properties of the power transformation
  3. Add assertions for the transformed data shape
 #[test]
 fn test_forecaster_power() {
     let data = &[1.0_f64, 2.0, 3.0, 4.0, 5.0];
     let transforms = vec![
         Transform::power_transform(data),
     ];
     let model = MSTLModel::new(vec![2], NaiveTrend::new());
     let mut forecaster = Forecaster::new(model).with_transforms(transforms);
     forecaster.fit(data).unwrap();
     let forecasts = forecaster.predict(4, None).unwrap();
+    // The expected values represent the forecasted values after applying
+    // the power transformation and its inverse. The alternating pattern
+    // is due to the seasonal component with period 2.
     assert_all_approx_eq(&forecasts.point, &[5.084499064884572, 5.000000030329821, 5.084499064884572, 5.000000030329821]);
+    
+    // Verify that the transformation preserves the data shape
+    assert_eq!(forecasts.point.len(), 4);
+    
+    // Verify that the transformed values are within reasonable bounds
+    for value in forecasts.point.iter() {
+        assert!(*value > 0.0, "Transformed values should be positive");
+        assert!(*value < 10.0, "Transformed values should be bounded");
+    }
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 214bee9 and 4460c58.

📒 Files selected for processing (3)
  • crates/augurs-forecaster/src/forecaster.rs (1 hunks)
  • crates/augurs-forecaster/src/power_transforms.rs (1 hunks)
  • crates/augurs-forecaster/src/transforms.rs (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/augurs-forecaster/src/transforms.rs
🔇 Additional comments (2)
crates/augurs-forecaster/src/power_transforms.rs (2)

30-33: LGTM!

The struct is well-designed with proper lifetime annotations for the borrowed data.


35-43: LGTM!

The implementation correctly maps the log-likelihood to its negative value for minimization and properly propagates errors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (5)
crates/augurs-forecaster/src/power_transforms.rs (4)

6-34: Consider optimizing the transformed data collection

The implementation looks good with proper error handling. However, we can optimize the transformed data collection by avoiding the double iteration over the data.

-    let transformed_data: Result<Vec<f64>, _> = data.iter().map(|&x| box_cox(x, lambda)).collect();
-
-    let transformed_data = match transformed_data {
-        Ok(values) => values,
-        Err(e) => return Err(Error::msg(e)),
-    };
+    let mut transformed_data = Vec::with_capacity(data.len());
+    for &x in data {
+        transformed_data.push(box_cox(x, lambda).map_err(Error::msg)?);
+    }

36-66: Optimize data collection and remove excessive whitespace

Similar to the Box-Cox implementation, we can optimize the data collection. Also, there's unnecessary whitespace that can be removed.

 fn yeo_johnson_log_likelihood(data: &[f64], lambda: f64) -> Result<f64, Error> {
     let n = data.len() as f64;
-    
     if n == 0.0 {
         return Err(Error::msg("Data array is empty"));
     }
-    
-    let transformed_data: Result<Vec<f64>, _> = data.iter().map(|&x| yeo_johnson(x, lambda)).collect();
-
-    let transformed_data = match transformed_data {
-        Ok(values) => values,
-        Err(e) => return Err(Error::msg(e)),
-    };
+    let mut transformed_data = Vec::with_capacity(data.len());
+    for &x in data {
+        transformed_data.push(yeo_johnson(x, lambda).map_err(Error::msg)?);
+    }

98-114: Consider making optimization parameters configurable

The optimization parameters (bounds: -2.0 to 2.0, max_iters: 1000) are hardcoded. Consider making these configurable for different use cases.

pub struct BoxCoxOptimizationParams {
    lower_bound: f64,
    upper_bound: f64,
    max_iterations: u64,
    initial_param: f64,
}

impl Default for BoxCoxOptimizationParams {
    fn default() -> Self {
        Self {
            lower_bound: -2.0,
            upper_bound: 2.0,
            max_iterations: 1000,
            initial_param: 0.5,
        }
    }
}

133-197: Enhance test coverage with edge cases

Consider adding test cases for:

  1. Lambda values at optimization bounds (-2.0, 2.0)
  2. Data with extreme values
  3. Data with all identical values (zero variance)
  4. Property-based tests for inverse(transform(x)) ≈ x
#[test]
fn test_box_cox_edge_cases() {
    // Test lambda at bounds
    let data = vec![0.1, 0.2, 0.3];
    let lambda = -2.0;
    let result = box_cox_log_likelihood(&data, lambda);
    assert!(result.is_ok());
    
    // Test extreme values
    let data = vec![1e-10, 1e10];
    let lambda = 0.5;
    let result = box_cox_log_likelihood(&data, lambda);
    assert!(result.is_ok());
    
    // Test zero variance
    let data = vec![1.0, 1.0, 1.0];
    let lambda = 0.5;
    let result = box_cox_log_likelihood(&data, lambda);
    assert!(result.is_err());
}
crates/augurs-forecaster/src/transforms.rs (1)

58-71: Enhance documentation with academic references

Consider adding references to the original papers for Box-Cox and Yeo-Johnson transformations in the documentation.

     /// Box-Cox transform.
+    /// 
+    /// Reference: Box, G. E. P. and Cox, D. R. (1964). "An Analysis of Transformations".
+    /// Journal of the Royal Statistical Society, Series B. 26 (2): 211–252.
     BoxCox {
         /// The lambda parameter for the Box-Cox transformation.
         /// If lambda == 0, the transformation is equivalent to the natural logarithm.
         /// Otherwise, the transformation is (x^lambda - 1) / lambda.
         lambda: f64,
     },
     /// Yeo-Johnson transform.
+    ///
+    /// Reference: Yeo, In-Kwon and Johnson, Richard A. (2000). "A New Family of Power 
+    /// Transformations to Improve Normality or Symmetry". Biometrika. 87 (4): 954–959.
     YeoJohnson {
         /// The lambda parameter for the Yeo-Johnson transformation.
         /// If lambda == 0, the transformation is equivalent to the natural logarithm.
         /// Otherwise, the transformation is ((x + 1)^lambda - 1) / lambda.
         lambda: f64,
     },
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4460c58 and f8ae9b0.

📒 Files selected for processing (3)
  • crates/augurs-forecaster/src/forecaster.rs (1 hunks)
  • crates/augurs-forecaster/src/power_transforms.rs (1 hunks)
  • crates/augurs-forecaster/src/transforms.rs (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/augurs-forecaster/src/forecaster.rs
🔇 Additional comments (4)
crates/augurs-forecaster/src/power_transforms.rs (2)

68-81: LGTM! Implementation follows best practices

The BoxCoxProblem implementation correctly uses references and properly implements the CostFunction trait.


83-96: LGTM! Consistent with BoxCoxProblem implementation

The YeoJohnsonProblem implementation follows the same pattern as BoxCoxProblem and correctly implements the CostFunction trait.

crates/augurs-forecaster/src/transforms.rs (2)

442-472: LGTM! Proper error handling implemented

The box_cox and yeo_johnson functions correctly handle errors and validate inputs.


474-614: LGTM! Reasonable error handling for iterators

The iterator adapters use a sensible approach of converting errors to NaN values, which is appropriate for numerical computations.

Comment on lines 116 to 131
pub(crate) fn optimize_yeo_johnson_lambda(data: &[f64]) -> Result<f64, Error> {
// Use Yeo-Johnson transformation
let cost = YeoJohnsonProblem { data };
let init_param = 0.5;
let solver = BrentOpt::new(-2.0, 2.0);

let result = Executor::new(cost, solver)
.configure(|state| state.param(init_param).max_iters(1000))
.run();

result.and_then(|res| {
res.state()
.best_param
.ok_or_else(|| Error::msg("No best parameter found"))
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Reduce code duplication with optimize_box_cox_lambda

The function is nearly identical to optimize_box_cox_lambda. Consider extracting the common optimization logic into a shared function.

fn optimize_lambda<T: CostFunction<Param = f64, Output = f64>>(
    cost: T,
    params: OptimizationParams,
) -> Result<f64, Error> {
    let solver = BrentOpt::new(params.lower_bound, params.upper_bound);
    let result = Executor::new(cost, solver)
        .configure(|state| state.param(params.initial_param).max_iters(params.max_iterations))
        .run();

    result.and_then(|res| {
        res.state()
            .best_param
            .ok_or_else(|| Error::msg("No best parameter found"))
    })
}

Comment on lines +748 to +764
#[test]
fn box_cox_test() {
let data = vec![1.0, 2.0, 3.0];
let lambda = 0.5;
let expected = vec![0.0, 0.8284271247461903, 1.4641016151377544];
let actual: Vec<_> = data.into_iter().box_cox(lambda).collect();
assert_all_close(&expected, &actual);
}

#[test]
fn inverse_box_cox_test() {
let data = vec![0.0, 0.5_f64.ln(), 1.0_f64.ln()];
let lambda = 0.5;
let expected = vec![1.0, 0.426966072919605, 1.0];
let actual: Vec<_> = data.into_iter().inverse_box_cox(lambda).collect();
assert_all_close(&expected, &actual);
}
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add tests for Yeo-Johnson transformation

Consider adding tests for Yeo-Johnson transformation similar to the Box-Cox tests, including:

  1. Basic transformation test
  2. Inverse transformation test
  3. Tests with negative values (unique to Yeo-Johnson)
#[test]
fn yeo_johnson_test() {
    let data = vec![-1.0, 0.0, 1.0];
    let lambda = 0.5;
    let expected = vec![-0.7071067811865476, 0.0, 0.8284271247461903];
    let actual: Vec<_> = data.into_iter().yeo_johnson(lambda).collect();
    assert_all_close(&expected, &actual);
}

#[test]
fn inverse_yeo_johnson_test() {
    let data = vec![-0.7071067811865476, 0.0, 0.8284271247461903];
    let lambda = 0.5;
    let expected = vec![-1.0, 0.0, 1.0];
    let actual: Vec<_> = data.into_iter().inverse_yeo_johnson(lambda).collect();
    assert_all_close(&expected, &actual);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coderabbit saved me here. After adding tests I realize the inverse yeo_johnson implementation wasn't correct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edwardcqian, glad to hear that adding the tests helped identify and fix the issue with the inverse Yeo-Johnson implementation!


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

Comment on lines 135 to 143
pub fn power_transform(data: &[f64]) -> Self {
if data.iter().all(|&x| x > 0.0) {
let lambda = optimize_box_cox_lambda(data).expect("Failed to optimize lambda (box-cox)");
Self::BoxCox { lambda }
} else {
let lambda = optimize_yeo_johnson_lambda(data).expect("Failed to optimize lambda (yeo-johnson)");
Self::YeoJohnson { lambda }
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve error handling in power_transform

The function uses expect() which can panic. Consider propagating the errors instead.

-pub fn power_transform(data: &[f64]) -> Self {
+pub fn power_transform(data: &[f64]) -> Result<Self, Error> {
     if data.iter().all(|&x| x > 0.0) {
-        let lambda = optimize_box_cox_lambda(data).expect("Failed to optimize lambda (box-cox)");
-        Self::BoxCox { lambda }
+        let lambda = optimize_box_cox_lambda(data)?;
+        Ok(Self::BoxCox { lambda })
     } else {
-        let lambda = optimize_yeo_johnson_lambda(data).expect("Failed to optimize lambda (yeo-johnson)");
-        Self::YeoJohnson { lambda }
+        let lambda = optimize_yeo_johnson_lambda(data)?;
+        Ok(Self::YeoJohnson { lambda })
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn power_transform(data: &[f64]) -> Self {
if data.iter().all(|&x| x > 0.0) {
let lambda = optimize_box_cox_lambda(data).expect("Failed to optimize lambda (box-cox)");
Self::BoxCox { lambda }
} else {
let lambda = optimize_yeo_johnson_lambda(data).expect("Failed to optimize lambda (yeo-johnson)");
Self::YeoJohnson { lambda }
}
}
pub fn power_transform(data: &[f64]) -> Result<Self, Error> {
if data.iter().all(|&x| x > 0.0) {
let lambda = optimize_box_cox_lambda(data)?;
Ok(Self::BoxCox { lambda })
} else {
let lambda = optimize_yeo_johnson_lambda(data)?;
Ok(Self::YeoJohnson { lambda })
}
}

Copy link
Collaborator

@sd2k sd2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice this looks pretty close to me, there are a few outstanding comments and I think we can merge once they're done!

crates/augurs-forecaster/src/transforms.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (3)
crates/augurs-forecaster/src/power_transforms.rs (2)

6-34: LGTM! Consider using a custom error type.

The error handling is thorough and well-implemented. However, using a string literal for errors could be improved by defining a custom error type for better error handling across the codebase.

Consider creating a custom error type:

#[derive(Debug, Error)]
pub enum TransformError {
    #[error("Data must not be empty")]
    EmptyData,
    #[error("All data must be greater than 0")]
    NonPositiveData,
    #[error("Variance must be positive")]
    NonPositiveVariance,
}

90-103: Consider reducing code duplication with BoxCoxProblem.

The implementation is correct but shares significant code structure with BoxCoxProblem. Consider extracting common functionality into a trait or generic struct.

Example approach:

trait OptimizationProblem {
    fn compute_log_likelihood(&self, lambda: f64) -> Result<f64, Error>;
}

struct TransformProblem<'a, F> {
    data: &'a [f64],
    transform_fn: F,
}

impl<F> CostFunction for TransformProblem<'_, F>
where
    F: Fn(&[f64], f64) -> Result<f64, Error>,
{
    type Param = f64;
    type Output = f64;

    fn cost(&self, lambda: &Self::Param) -> Result<Self::Output, Error> {
        (self.transform_fn)(self.data, *lambda).map(|ll| -ll)
    }
}
crates/augurs-forecaster/src/transforms.rs (1)

756-772: Enhance test coverage with edge cases.

The current tests cover basic functionality but should be expanded to include:

  1. Edge cases (lambda = 0, lambda = 1)
  2. Error cases (invalid inputs)
  3. Property-based tests

Example additional tests:

#[test]
fn box_cox_edge_cases() {
    // Test lambda = 0 (log transform)
    let data = vec![1.0, 2.0, 3.0];
    let lambda = 0.0;
    let expected: Vec<_> = data.iter().map(|x| x.ln()).collect();
    let actual: Vec<_> = data.into_iter().box_cox(lambda).collect();
    assert_all_close(&expected, &actual);
}

#[test]
fn box_cox_invalid_input() {
    let data = vec![-1.0, 0.0, 1.0];
    let lambda = 0.5;
    let result: Vec<_> = data.into_iter().box_cox(lambda).collect();
    assert!(result.iter().any(|&x| x.is_nan()));
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f8ae9b0 and 434ab52.

📒 Files selected for processing (3)
  • crates/augurs-forecaster/src/forecaster.rs (1 hunks)
  • crates/augurs-forecaster/src/power_transforms.rs (1 hunks)
  • crates/augurs-forecaster/src/transforms.rs (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/augurs-forecaster/src/forecaster.rs
🔇 Additional comments (5)
crates/augurs-forecaster/src/power_transforms.rs (3)

36-73: LGTM! Well-structured implementation.

The function handles all edge cases appropriately, including NaN checks and proper error handling. The mathematical implementation is correct for both positive and negative values.


75-88: LGTM! Clean and efficient implementation.

The struct and trait implementation are well-designed, with proper lifetime handling and error propagation.


123-138: Reduce code duplication with optimize_box_cox_lambda.

The function is nearly identical to optimize_box_cox_lambda. Consider extracting the common optimization logic into a shared function.

crates/augurs-forecaster/src/transforms.rs (2)

58-71: LGTM! Well-documented enum variants.

The BoxCox and YeoJohnson variants are well-structured with clear documentation explaining the lambda parameter and transformation behavior.


444-474: LGTM! Robust implementation with proper error handling.

The functions include appropriate input validation and clear error messages. The implementations correctly handle edge cases and return Results instead of panicking.

crates/augurs-forecaster/src/power_transforms.rs Outdated Show resolved Hide resolved
crates/augurs-forecaster/src/transforms.rs Outdated Show resolved Hide resolved
Comment on lines +476 to +507
/// An iterator adapter that applies the Box-Cox transformation to each item.
#[derive(Clone, Debug)]
struct BoxCox<T> {
inner: T,
lambda: f64,
}

impl<T> Iterator for BoxCox<T>
where
T: Iterator<Item = f64>,
{
type Item = f64;
fn next(&mut self) -> Option<Self::Item> {
self.inner
.next()
.map(|x| box_cox(x, self.lambda).unwrap_or(f64::NAN))
}
}

trait BoxCoxExt: Iterator<Item = f64> {
fn box_cox(self, lambda: f64) -> BoxCox<Self>
where
Self: Sized,
{
BoxCox {
inner: self,
lambda,
}
}
}

impl<T> BoxCoxExt for T where T: Iterator<Item = f64> {}
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Reconsider error handling strategy in iterator implementation.

Converting errors to NaN using unwrap_or(f64::NAN) could mask underlying issues and make debugging more difficult.

Consider propagating errors through the iterator:

pub struct BoxCoxIter<T> {
    inner: T,
    lambda: f64,
    error: Option<&'static str>,
}

impl<T> Iterator for BoxCoxIter<T>
where
    T: Iterator<Item = f64>,
{
    type Item = Result<f64, &'static str>;
    
    fn next(&mut self) -> Option<Self::Item> {
        if let Some(err) = self.error {
            return Some(Err(err));
        }
        self.inner.next().map(|x| box_cox(x, self.lambda))
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable, but maybe worth doing in another PR? since we'd have to change the output type for T for everything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edwardcqian, understood. It makes sense to address this in a separate PR to avoid widespread changes right now. Would you like me to open a GitHub issue to track this enhancement?

Comment on lines 509 to 516
/// Returns the inverse Box-Cox transformation of the given value.
fn inverse_box_cox(y: f64, lambda: f64) -> f64 {
if lambda == 0.0 {
y.exp()
} else {
(y * lambda + 1.0).powf(1.0 / lambda)
}
}
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add domain validation to prevent invalid computations.

The function should validate that y * lambda + 1.0 is positive when computing the power function.

fn inverse_box_cox(y: f64, lambda: f64) -> Result<f64, &'static str> {
    if lambda == 0.0 {
        Ok(y.exp())
    } else {
        let value = y * lambda + 1.0;
        if value <= 0.0 {
            Err("Invalid domain for inverse Box-Cox transformation")
        } else {
            Ok(value.powf(1.0 / lambda))
        }
    }
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edwardcqian should we be doing this? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we will reasonably get non-pos value here but doesn't hurt.

@edwardcqian edwardcqian requested a review from sd2k December 10, 2024 19:27
Copy link
Collaborator

@sd2k sd2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple of comments on other comments 😁 there are some other unresolved ones too, e.g. about removing pub for some items

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
crates/augurs-forecaster/src/transforms.rs (1)

58-64: Enhance BoxCox documentation to cover edge cases.

The documentation should clarify that Box-Cox transformation is only defined for positive values, similar to how the Yeo-Johnson documentation explains its behavior with negative values.

     /// Box-Cox transform.
+    /// This transform is only defined for positive values.
+    /// For non-positive values, the transformation will return NaN.
     BoxCox {
         /// The lambda parameter for the Box-Cox transformation.
         /// If lambda == 0, the transformation is equivalent to the natural logarithm.
         /// Otherwise, the transformation is (x^lambda - 1) / lambda.
         lambda: f64,
     },
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 434ab52 and 75bf73a.

📒 Files selected for processing (1)
  • crates/augurs-forecaster/src/transforms.rs (8 hunks)
🔇 Additional comments (4)
crates/augurs-forecaster/src/transforms.rs (4)

756-772: Add more test cases for transformations.

Current tests only cover happy paths. Consider adding:

  1. Edge case where lambda = 0
  2. Error cases with negative or zero inputs
  3. Error cases in inverse transformation
  4. Property-based tests to verify that inverse(transform(x)) ≈ x

Example additional tests:

#[test]
fn box_cox_zero_lambda() {
    let data = vec![1.0, 2.0, 3.0];
    let lambda = 0.0;
    let expected: Vec<_> = data.iter().map(|x| x.ln()).collect();
    let actual: Vec<_> = data.into_iter().box_cox(lambda).collect();
    assert_all_close(&expected, &actual);
}

#[test]
fn box_cox_negative_input() {
    let data = vec![-1.0];
    let lambda = 0.5;
    let result: Result<f64, _> = box_cox(-1.0, lambda);
    assert!(result.is_err());
}

129-145: ⚠️ Potential issue

Replace expect() with proper error handling.

The current implementation can panic at runtime. This is particularly problematic for a library function. Consider returning a Result instead.

-    pub fn power_transform(data: &[f64]) -> Self {
+    pub fn power_transform(data: &[f64]) -> Result<Self, Box<dyn std::error::Error>> {
         if data.iter().all(|&x| x > 0.0) {
-            let lambda =
-                optimize_box_cox_lambda(data).expect("Failed to optimize lambda (box-cox)");
-            Self::BoxCox { lambda }
+            let lambda = optimize_box_cox_lambda(data)?;
+            Ok(Self::BoxCox { lambda })
         } else {
-            let lambda =
-                optimize_yeo_johnson_lambda(data).expect("Failed to optimize lambda (yeo-johnson)");
-            Self::YeoJohnson { lambda }
+            let lambda = optimize_yeo_johnson_lambda(data)?;
+            Ok(Self::YeoJohnson { lambda })
         }
     }

509-516: ⚠️ Potential issue

Add domain validation for inverse Box-Cox transformation.

The function should validate that y * lambda + 1.0 is positive when computing the power function.

-fn inverse_box_cox(y: f64, lambda: f64) -> f64 {
+fn inverse_box_cox(y: f64, lambda: f64) -> Result<f64, &'static str> {
     if lambda == 0.0 {
-        y.exp()
+        Ok(y.exp())
     } else {
         let value = y * lambda + 1.0;
+        if value <= 0.0 {
+            return Err("Invalid domain for inverse Box-Cox transformation");
+        }
-        (y * lambda + 1.0).powf(1.0 / lambda)
+        Ok(value.powf(1.0 / lambda))
     }
 }

476-493: ⚠️ Potential issue

Improve error handling in transformation iterators.

Converting errors to NaN silently masks issues and makes debugging harder. Consider propagating errors through the iterator.

Example for BoxCox (apply similar changes to YeoJohnson):

 struct BoxCox<T> {
     inner: T,
     lambda: f64,
+    error: Option<&'static str>,
 }

 impl<T> Iterator for BoxCox<T>
 where
     T: Iterator<Item = f64>,
 {
-    type Item = f64;
+    type Item = Result<f64, &'static str>;
     
     fn next(&mut self) -> Option<Self::Item> {
+        if let Some(err) = self.error {
+            return Some(Err(err));
+        }
-        self.inner.next().map(|x| box_cox(x, self.lambda).unwrap_or(f64::NAN))
+        self.inner.next().map(|x| box_cox(x, self.lambda))
     }
 }

Also applies to: 549-566

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (6)
crates/augurs-forecaster/src/power_transforms.rs (4)

6-34: Consider using custom error types and extracting variance calculation.

The error handling is good, but using custom error types would make the API more robust and maintainable.

Consider creating a custom error enum:

#[derive(Debug, Error)]
pub enum TransformError {
    #[error("Data must not be empty")]
    EmptyData,
    #[error("All data must be greater than 0")]
    NonPositiveData,
    #[error("Variance must be positive")]
    NonPositiveVariance,
}

Also, consider extracting the variance calculation into a separate function for better readability:

fn calculate_variance(data: &[f64], mean: f64) -> f64 {
    data.iter()
        .map(|&x| (x - mean).powi(2))
        .sum::<f64>()
        / data.len() as f64
}

36-73: Extract shared functionality with box_cox_log_likelihood.

There's code duplication in the variance calculation between this function and box_cox_log_likelihood.

Consider extracting shared functionality:

fn validate_data_length(data: &[f64]) -> Result<(), TransformError> {
    if data.is_empty() {
        Err(TransformError::EmptyData)
    } else {
        Ok(())
    }
}

fn calculate_mean_and_variance(data: &[f64]) -> Result<(f64, f64), TransformError> {
    let n = data.len() as f64;
    let mean = data.iter().sum::<f64>() / n;
    let variance = calculate_variance(data, mean);
    
    if variance <= 0.0 {
        Err(TransformError::NonPositiveVariance)
    } else {
        Ok((mean, variance))
    }
}

112-130: Enhance documentation for optimization parameters.

The function is well-structured, but the documentation could be more detailed about the optimization parameters and their impact.

Add more detailed documentation:

/// Optimize a parameter using Brent's method.
///
/// # Parameters
/// * `cost` - The cost function to minimize
/// * `params` - Optimization parameters:
///   * `initial_param` - Starting point for optimization
///   * `lower_bound` - Lower bound for parameter search
///   * `upper_bound` - Upper bound for parameter search
///   * `max_iterations` - Maximum number of iterations
///
/// # Returns
/// * `Ok(f64)` - The optimal parameter value
/// * `Err(Error)` - If optimization fails or no best parameter is found

162-220: Enhance test coverage with property-based tests and edge cases.

While the current tests are good, adding property-based tests and more edge cases would improve reliability.

Consider adding:

  1. Property-based tests using proptest:
use proptest::prelude::*;

proptest! {
    #[test]
    fn box_cox_inverse_property(x in 0.1f64..100.0, lambda in -2.0f64..2.0) {
        let transformed = box_cox(x, lambda)?;
        let original = inverse_box_cox(transformed, lambda);
        prop_assert!((x - original).abs() < 1e-10);
    }
}
  1. Edge cases:
#[test]
fn test_box_cox_extreme_values() {
    let data = &[f64::MIN_POSITIVE, f64::MAX];
    let lambda = 0.5;
    let result = optimize_box_cox_lambda(data);
    assert!(result.is_ok());
}
crates/augurs-forecaster/src/transforms.rs (2)

Line range hint 58-145: Improve error handling and documentation in power_transform.

The power_transform method uses expect() which can panic, and the documentation could be more detailed.

  1. Return Result instead of using expect:
-pub fn power_transform(data: &[f64]) -> Self {
+pub fn power_transform(data: &[f64]) -> Result<Self, Error> {
     if data.iter().all(|&x| x > 0.0) {
-        let lambda = optimize_box_cox_lambda(data).expect("Failed to optimize lambda (box-cox)");
-        Self::BoxCox { lambda }
+        optimize_box_cox_lambda(data).map(|lambda| Self::BoxCox { lambda })
     } else {
-        let lambda = optimize_yeo_johnson_lambda(data).expect("Failed to optimize lambda (yeo-johnson)");
-        Self::YeoJohnson { lambda }
+        optimize_yeo_johnson_lambda(data).map(|lambda| Self::YeoJohnson { lambda })
     }
 }
  1. Enhance documentation:
/// Create a power transform that automatically selects and optimizes the appropriate transformation.
///
/// # Algorithm Selection
/// - If all values are positive: Uses Box-Cox transformation
/// - If any values are zero or negative: Uses Yeo-Johnson transformation
///
/// # Errors
/// Returns an error if:
/// - The optimization algorithm fails to converge
/// - The data is empty
/// - The data has invalid values (NaN, Infinity)

766-800: Add tests for error cases and edge values.

The current tests cover the happy path well, but could benefit from more comprehensive testing.

Add tests for:

  1. Error cases:
#[test]
fn test_box_cox_invalid_input() {
    let data = vec![-1.0, 0.0, 1.0];
    let lambda = 0.5;
    let result: Vec<_> = data.into_iter().box_cox(lambda).collect();
    assert!(result.iter().any(|&x| x.is_nan()));
}
  1. Edge cases:
#[test]
fn test_yeo_johnson_edge_cases() {
    let data = vec![f64::MIN_POSITIVE, 0.0, f64::MAX];
    let lambda = 0.5;
    let result: Vec<_> = data.into_iter().yeo_johnson(lambda).collect();
    assert!(!result.iter().any(|&x| x.is_nan()));
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 75bf73a and 6e82356.

📒 Files selected for processing (3)
  • crates/augurs-forecaster/src/forecaster.rs (1 hunks)
  • crates/augurs-forecaster/src/power_transforms.rs (1 hunks)
  • crates/augurs-forecaster/src/transforms.rs (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/augurs-forecaster/src/forecaster.rs
🔇 Additional comments (1)
crates/augurs-forecaster/src/transforms.rs (1)

444-474: Implementation looks good!

The functions have proper error handling, clear input validation, and handle edge cases well.

crates/augurs-forecaster/src/power_transforms.rs Outdated Show resolved Hide resolved
crates/augurs-forecaster/src/transforms.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
crates/augurs-forecaster/src/power_transforms.rs (1)

Line range hint 589-605: Implement Error Handling in inverse_yeo_johnson Function

Currently, the inverse_yeo_johnson function does not handle potential mathematical errors and directly returns an f64. To ensure robustness and consistency with inverse_box_cox, consider modifying the function to return a Result<f64, &'static str> and handle possible domain errors explicitly.

Apply this diff to implement error handling:

-fn inverse_yeo_johnson(y: f64, lambda: f64) -> f64 {
+fn inverse_yeo_johnson(y: f64, lambda: f64) -> Result<f64, &'static str> {
     const EPSILON: f64 = 1e-6;

     if y >= 0.0 && lambda.abs() < EPSILON {
         // For lambda close to 0 (positive values)
-        Ok((y.exp()) - 1.0)
+        Ok((y.exp()) - 1.0)
     } else if y >= 0.0 {
         // For positive values (lambda not close to 0)
         let value = y * lambda + 1.0;
         if value <= 0.0 {
             Err("Invalid domain for inverse Yeo-Johnson transformation")
         } else {
-            Ok(value.powf(1.0 / lambda) - 1.0)
+            Ok(value.powf(1.0 / lambda) - 1.0)
         }
     } else if (lambda - 2.0).abs() < EPSILON {
         // For lambda close to 2 (negative values)
-        Ok(-(-y.exp() - 1.0))
+        Ok(-(-y.exp() - 1.0))
     } else {
         // For negative values (lambda not close to 2)
         let value = -((2.0 - lambda) * y) + 1.0;
         if value <= 0.0 {
             Err("Invalid domain for inverse Yeo-Johnson transformation")
         } else {
-            Ok(-((value.powf(1.0 / (2.0 - lambda))) - 1.0))
+            Ok(-((value.powf(1.0 / (2.0 - lambda))) - 1.0))
         }
     }
 }

This change ensures that any invalid computations are caught and handled appropriately, improving the reliability of the transformation.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6e82356 and 2e431ae.

📒 Files selected for processing (3)
  • crates/augurs-forecaster/src/forecaster.rs (1 hunks)
  • crates/augurs-forecaster/src/power_transforms.rs (1 hunks)
  • crates/augurs-forecaster/src/transforms.rs (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/augurs-forecaster/src/forecaster.rs
🔇 Additional comments (3)
crates/augurs-forecaster/src/transforms.rs (3)

487-491: 🛠️ Refactor suggestion

Propagate Errors in BoxCox Iterator Instead of Using NaN

Using unwrap_or(f64::NAN) in the BoxCox iterator can mask underlying errors and make debugging difficult. Consider modifying the iterator to return Result<f64, &'static str>, allowing errors to be handled explicitly.

Apply this diff to refactor the iterator:

 impl<T> Iterator for BoxCox<T>
 where
     T: Iterator<Item = f64>,
 {
-    type Item = f64;
+    type Item = Result<f64, &'static str>;
     fn next(&mut self) -> Option<Self::Item> {
         self.inner
             .next()
-            .map(|x| box_cox(x, self.lambda).unwrap_or(f64::NAN))
+            .map(|x| box_cox(x, self.lambda))
     }
 }

Similarly, update the trait and usage:

 trait BoxCoxExt: Iterator<Item = f64> {
-    fn box_cox(self, lambda: f64) -> BoxCox<Self>
+    fn box_cox(self, lambda: f64) -> BoxCox<Self>
+    where
+        Self: Sized,
+    {
+        BoxCox {
+            inner: self,
+            lambda,
+        }
+    }
 }

This change ensures that errors are not silently converted to NaN, improving the robustness of the transformation.


568-571: 🛠️ Refactor suggestion

Propagate Errors in YeoJohnson Iterator Instead of Using NaN

Similar to the BoxCox iterator, the YeoJohnson iterator uses unwrap_or(f64::NAN), which can conceal errors. Modify the iterator to return Result<f64, &'static str> to handle errors explicitly.

Apply this diff to refactor the YeoJohnson iterator:

 impl<T> Iterator for YeoJohnson<T>
 where
     T: Iterator<Item = f64>,
 {
-    type Item = f64;
+    type Item = Result<f64, &'static str>;
     fn next(&mut self) -> Option<Self::Item> {
         self.inner
             .next()
-            .map(|x| yeo_johnson(x, self.lambda).unwrap_or(f64::NAN))
+            .map(|x| yeo_johnson(x, self.lambda))
     }
 }

And update the trait accordingly.


154-154: ⚠️ Potential issue

Correct the Self::Log Transformation in the transform Method

In the transform method, the Self::Log variant currently applies input.exp(), which computes the exponential of the input. This seems incorrect because the Log transformation should apply the natural logarithm. Please change it to input.log() to correctly perform the logarithm transformation.

Apply this diff to fix the issue:

-            Self::Log => Box::new(input.exp()),
+            Self::Log => Box::new(input.log()),

Likely invalid or redundant comment.

Copy link
Collaborator

@sd2k sd2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for making all those changes @edwardcqian 🥳

@sd2k sd2k enabled auto-merge (squash) December 11, 2024 08:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
crates/augurs-forecaster/src/transforms.rs (2)

59-65: Add precondition to BoxCox doc comment.

The doc comment should mention that the input values must be positive (x > 0) for the Box-Cox transformation to be valid.

     /// Box-Cox transform.
+    ///
+    /// # Preconditions
+    /// - Input values must be positive (x > 0)
+    ///
     BoxCox {

779-813: Consider adding edge case tests.

While the current test coverage is good, consider adding tests for:

  1. Box-Cox with lambda = 0 (log transform case)
  2. Yeo-Johnson with lambda = 0 and lambda = 2 (special cases)
  3. Error cases (NaN, infinity)

Example test cases:

#[test]
fn box_cox_lambda_zero() {
    let data = vec![1.0, 2.0, 3.0];
    let lambda = 0.0;
    let expected: Vec<_> = data.iter().map(|x| x.ln()).collect();
    let actual: Vec<_> = data.into_iter().box_cox(lambda).collect();
    assert_all_close(&expected, &actual);
}

#[test]
fn yeo_johnson_special_cases() {
    let data = vec![-1.0, 0.0, 1.0];
    // Test lambda = 0
    let expected_zero = vec![
        -(-1.0 + 1.0).ln(),
        0.0,
        (1.0 + 1.0).ln()
    ];
    let actual_zero: Vec<_> = data.clone().into_iter().yeo_johnson(0.0).collect();
    assert_all_close(&expected_zero, &actual_zero);
    
    // Test lambda = 2
    let expected_two = vec![
        -(-(-1.0) + 1.0).ln(),
        0.0,
        ((1.0 + 1.0).powi(2) - 1.0) / 2.0
    ];
    let actual_two: Vec<_> = data.into_iter().yeo_johnson(2.0).collect();
    assert_all_close(&expected_two, &actual_two);
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2e431ae and c1c9303.

📒 Files selected for processing (1)
  • crates/augurs-forecaster/src/transforms.rs (8 hunks)
🔇 Additional comments (5)
crates/augurs-forecaster/src/transforms.rs (5)

135-151: LGTM! Well-implemented power transform selection.

The implementation correctly handles the selection between Box-Cox and Yeo-Johnson transformations based on data characteristics, with proper error handling and clear documentation.


515-527: LGTM! Good domain validation.

The implementation correctly validates that y * lambda + 1.0 is positive before computing the power function, preventing potential domain errors.


463-480: LGTM! Robust implementation of Yeo-Johnson transform.

The implementation includes:

  • Proper error handling for NaN inputs
  • Correct handling of both positive and negative values
  • Clear implementation of the piecewise function

595-612: LGTM! Well-implemented inverse Yeo-Johnson transform.

The implementation includes:

  • Good use of EPSILON for floating-point comparisons
  • Clear handling of all cases in the piecewise function
  • Well-documented code with clear comments for each case

482-498: 🛠️ Refactor suggestion

Consider propagating errors in BoxCox iterator.

The current implementation silently converts errors to NaN using unwrap_or(f64::NAN), which could mask underlying issues and make debugging more difficult.

Consider propagating errors through the iterator:

 struct BoxCox<T> {
     inner: T,
     lambda: f64,
+    error: Option<&'static str>,
 }

 impl<T> Iterator for BoxCox<T>
 where
     T: Iterator<Item = f64>,
 {
-    type Item = f64;
+    type Item = Result<f64, &'static str>;
     
     fn next(&mut self) -> Option<Self::Item> {
+        if let Some(err) = self.error {
+            return Some(Err(err));
+        }
-        self.inner.next().map(|x| box_cox(x, self.lambda).unwrap_or(f64::NAN))
+        self.inner.next().map(|x| box_cox(x, self.lambda))
     }
 }

Likely invalid or redundant comment.

@sd2k sd2k merged commit bfd842a into main Dec 11, 2024
24 checks passed
@sd2k sd2k deleted the add-transformation-crate-w-boxcox branch December 11, 2024 08:30
@sd2k sd2k mentioned this pull request Dec 11, 2024
sd2k added a commit that referenced this pull request Dec 11, 2024
Happily, cargo-semver-checks caught the addition of some variants
in #185, but it's a shame that had to be a minor bump. This marks
the enum as non-exhaustive so future transform additions won't
necessarily be considered breaking changes. It's unlikely users
will ever have to match on this anyway so this shouldn't cause
any problems (it could probably be a struct with an 'inner'
enum, really).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants