-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed target name/series ID divider and added ability to return series ID column with predictions #4357
Conversation
08713ad
to
2845145
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4357 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 357 357
Lines 39869 39910 +41
=======================================
+ Hits 39749 39790 +41
Misses 120 120
☔ View full report in Codecov by Sentry. |
6de173a
to
a5049b0
Compare
y_unstacked = y_unstacked[ | ||
y_train_unstacked.columns.intersection(y_unstacked.columns) | ||
] | ||
X_unstacked = X_unstacked[ | ||
X_train_unstacked.columns.intersection(X_unstacked.columns) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you not just want the X_train/y_train columns here?
@@ -133,7 +149,14 @@ def predict_in_sample( | |||
objective, | |||
calculating_residuals, | |||
) | |||
stacked_predictions = stack_data(unstacked_predictions) | |||
if include_series_id: | |||
stacked_predictions = stack_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're missing testing on this branch?
dd0346d
to
4a1afa2
Compare
4a1afa2
to
08694b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments, and I think we're missing a test!
# Order series columns to be same as expected input feature names | ||
input_features = list(self.input_feature_names.values())[0] | ||
X_unstacked = X_unstacked[ | ||
[feature for feature in input_features if feature in X_unstacked.columns] | ||
] | ||
X_train_unstacked = X_train_unstacked[ | ||
[ | ||
feature | ||
for feature in input_features | ||
if feature in X_train_unstacked.columns | ||
] | ||
] | ||
y_overlapping_features = [ | ||
feature | ||
for feature in y_train_unstacked.columns | ||
if feature in y_unstacked.columns | ||
] | ||
y_unstacked = y_unstacked[y_overlapping_features] | ||
y_train_unstacked = y_train_unstacked[y_overlapping_features] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really long chunk of text, where a lot of it's repeated. A few questions:
- Is there a test case that covers this? (i.e. one that fails without this code)
- Are
X_unstacked
andX_train_unstacked
ever going to have different columns? It seems odd that we get those separately from each other, so differently from howy
is handled here - Is the goal here to filter columns, reorder columns, or both? The comment makes me think it's re-ordering, but the code makes me think we're filtering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will push an additional test case. Actually our current test case forpredict_in_sample()
errors out if it isn't in the right order. I could add something explicitly if you think it would be helpful?- This covers the case when we're forecasting. When we're forecasting, we only pass in the dates + the series IDs. If we're using lagged features (like in the future), we can pull them from
X_train
even if they're not specified in the currentX
. We can generally expect they
andy_train
values to be consistent since the column names come from the same series ID values. - The goal is to do both for the reason described above. I can update the comment to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say the current test case errors out if it isn't in the right order, does that mean you changed it around manually to verify it fails in that case? I'm thinking we'd benefit from an explicit test case that fails if this code isn't in place, no modification required. It'll help stop us from removing or breaking this bit in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test_multiseries_pipeline_predict_in_sample_series_out_of_order()
which evaluates this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after recent revisions
Resolves #4359