Changed target name/series ID divider and added ability to return series ID column with predictions #4357

christopherbunn · 2023-10-25T20:50:43Z

Resolves #4359

codecov · 2023-10-25T20:58:36Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8ffa04f) 99.7% compared to head (36185ae) 99.7%.

Additional details and impacted files

@@           Coverage Diff           @@
##            main   #4357     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        357     357             
  Lines      39869   39910     +41     
=======================================
+ Hits       39749   39790     +41     
  Misses       120     120

Files	Coverage Δ
...valml/pipelines/multiseries_regression_pipeline.py	`100.0% <100.0%> (ø)`
...valml/pipelines/time_series_regression_pipeline.py	`100.0% <100.0%> (ø)`
evalml/pipelines/utils.py	`99.7% <100.0%> (+0.1%)`	⬆️
...sts/component_tests/test_time_series_featurizer.py	`99.7% <100.0%> (+0.1%)`	⬆️
.../tests/component_tests/test_time_series_imputer.py	`100.0% <100.0%> (ø)`
evalml/tests/conftest.py	`98.4% <100.0%> (+0.1%)`	⬆️
...line_tests/test_multiseries_regression_pipeline.py	`100.0% <100.0%> (ø)`
evalml/tests/pipeline_tests/test_pipeline_utils.py	`99.6% <ø> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

chukarsten · 2023-10-30T18:43:11Z

evalml/pipelines/multiseries_regression_pipeline.py

+        y_unstacked = y_unstacked[
+            y_train_unstacked.columns.intersection(y_unstacked.columns)
+        ]
+        X_unstacked = X_unstacked[
+            X_train_unstacked.columns.intersection(X_unstacked.columns)
+        ]


Do you not just want the X_train/y_train columns here?

chukarsten · 2023-10-30T18:50:16Z

evalml/pipelines/multiseries_regression_pipeline.py

@@ -133,7 +149,14 @@ def predict_in_sample(
            objective,
            calculating_residuals,
        )
-        stacked_predictions = stack_data(unstacked_predictions)
+        if include_series_id:
+            stacked_predictions = stack_data(


We're missing testing on this branch?

evalml/pipelines/time_series_regression_pipeline.py

evalml/pipelines/utils.py

eccabay

Just a few comments, and I think we're missing a test!

eccabay · 2023-11-01T14:56:00Z

evalml/pipelines/multiseries_regression_pipeline.py

+        # Order series columns to be same as expected input feature names
+        input_features = list(self.input_feature_names.values())[0]
+        X_unstacked = X_unstacked[
+            [feature for feature in input_features if feature in X_unstacked.columns]
+        ]
+        X_train_unstacked = X_train_unstacked[
+            [
+                feature
+                for feature in input_features
+                if feature in X_train_unstacked.columns
+            ]
+        ]
+        y_overlapping_features = [
+            feature
+            for feature in y_train_unstacked.columns
+            if feature in y_unstacked.columns
+        ]
+        y_unstacked = y_unstacked[y_overlapping_features]
+        y_train_unstacked = y_train_unstacked[y_overlapping_features]


This is a really long chunk of text, where a lot of it's repeated. A few questions:

Is there a test case that covers this? (i.e. one that fails without this code)

Are X_unstacked and X_train_unstacked ever going to have different columns? It seems odd that we get those separately from each other, so differently from how y is handled here

Is the goal here to filter columns, reorder columns, or both? The comment makes me think it's re-ordering, but the code makes me think we're filtering

~~Will push an additional test case~~. Actually our current test case for predict_in_sample() errors out if it isn't in the right order. I could add something explicitly if you think it would be helpful?

This covers the case when we're forecasting. When we're forecasting, we only pass in the dates + the series IDs. If we're using lagged features (like in the future), we can pull them from X_train even if they're not specified in the current X. We can generally expect the y and y_train values to be consistent since the column names come from the same series ID values.

The goal is to do both for the reason described above. I can update the comment to clarify.

When you say the current test case errors out if it isn't in the right order, does that mean you changed it around manually to verify it fails in that case? I'm thinking we'd benefit from an explicit test case that fails if this code isn't in place, no modification required. It'll help stop us from removing or breaking this bit in the future

Added test_multiseries_pipeline_predict_in_sample_series_out_of_order() which evaluates this case.

evalml/pipelines/multiseries_regression_pipeline.py

jeremyliweishih

LGTM after recent revisions

christopherbunn force-pushed the add_seriesid_pred_in_sample branch from 08713ad to 2845145 Compare October 25, 2023 20:50

christopherbunn force-pushed the add_seriesid_pred_in_sample branch from 6de173a to a5049b0 Compare October 31, 2023 04:05

chukarsten reviewed Oct 31, 2023

View reviewed changes

christopherbunn force-pushed the add_seriesid_pred_in_sample branch from dd0346d to 4a1afa2 Compare October 31, 2023 21:43

machineFL and others added 2 commits October 31, 2023 17:43

Try extra debug

69be047

Updated release notes

08694b0

christopherbunn force-pushed the add_seriesid_pred_in_sample branch from 4a1afa2 to 08694b0 Compare October 31, 2023 21:43

Join with separator symbol

5721615

christopherbunn marked this pull request as ready for review November 1, 2023 13:57

auto-assign bot assigned christopherbunn Nov 1, 2023

christopherbunn requested review from jeremyliweishih, MichaelFu512, eccabay and chukarsten November 1, 2023 13:57

eccabay requested changes Nov 1, 2023

View reviewed changes

christopherbunn added 2 commits November 1, 2023 12:28

Added infer to the end

fa437a1

updated test to infer type

95e6681

christopherbunn requested a review from eccabay November 1, 2023 18:34

jeremyliweishih approved these changes Nov 2, 2023

View reviewed changes

Add series ID indexing test

36185ae

eccabay approved these changes Nov 2, 2023

View reviewed changes

christopherbunn enabled auto-merge (squash) November 2, 2023 21:08

christopherbunn merged commit 735ca67 into main Nov 2, 2023
25 checks passed

christopherbunn deleted the add_seriesid_pred_in_sample branch November 2, 2023 21:18

MichaelFu512 mentioned this pull request Nov 2, 2023

release #4360

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changed target name/series ID divider and added ability to return series ID column with predictions #4357

Changed target name/series ID divider and added ability to return series ID column with predictions #4357

christopherbunn commented Oct 25, 2023 •

edited

Loading

codecov bot commented Oct 25, 2023 •

edited

Loading

chukarsten Oct 30, 2023

chukarsten Oct 30, 2023

eccabay left a comment

eccabay Nov 1, 2023

christopherbunn Nov 1, 2023 •

edited

Loading

eccabay Nov 1, 2023

christopherbunn Nov 2, 2023

jeremyliweishih left a comment

Changed target name/series ID divider and added ability to return series ID column with predictions #4357

Changed target name/series ID divider and added ability to return series ID column with predictions #4357

Conversation

christopherbunn commented Oct 25, 2023 • edited Loading

codecov bot commented Oct 25, 2023 • edited Loading

Codecov Report

chukarsten Oct 30, 2023

Choose a reason for hiding this comment

chukarsten Oct 30, 2023

Choose a reason for hiding this comment

eccabay left a comment

Choose a reason for hiding this comment

eccabay Nov 1, 2023

Choose a reason for hiding this comment

christopherbunn Nov 1, 2023 • edited Loading

Choose a reason for hiding this comment

eccabay Nov 1, 2023

Choose a reason for hiding this comment

christopherbunn Nov 2, 2023

Choose a reason for hiding this comment

jeremyliweishih left a comment

Choose a reason for hiding this comment

christopherbunn commented Oct 25, 2023 •

edited

Loading

codecov bot commented Oct 25, 2023 •

edited

Loading

christopherbunn Nov 1, 2023 •

edited

Loading