-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric calculation is bogus #223
Comments
Not actionable as written. Closing, can reopen with more details if needed. |
Given the following temporal configuration: temporal_config:
feature_start_time: '2010-01-04'
feature_end_time: '2019-01-01'
label_start_time: '2015-02-01'
label_end_time: '2019-01-01'
model_update_frequency: '1y'
training_label_timespans: ['1month']
training_as_of_date_frequencies: '1month'
test_durations: '1y'
test_label_timespans: ['1month']
test_as_of_date_frequencies: '1month' Resulting in the following temporal configuration: As you can see, we will realize 12 different predictions in the test using the train model. Should we get 12 different metric calculations? An array? Just the total one? |
My feeling on this is that there should be a different set of parameters in your temporal config, I feel like there are already issues to this effect somewhere. |
Ah, yes, I said the same thing in #378. Doesn't make me right, just consistent. :) |
Another thought on this: We are doing evaluations the same way (making one evaluation over all dates) in both test and train. For EWS problems, presumably, this method is equally bogus in both train and test. Should there be a flag to control this behavior? |
This commit addresses #663, #378, #223 by allowing a model to be evaluated multiple times and thereby allowing users to see whether performance of single trained model degrades over the time following training. Users must now set a timechop parameter, `test_evaluation_frequency` that will add multiple test matrices to a time split. A model will be tested once on each matrix in its list. Matrices are added until they reach the label time limit, testing all models on the final test period (assuming that you make model_update_frequency evenly dividable by test_evaluation_frequency). This initial commit only makes changes to timechop proper. Remaining work includes: - Write tests for the new behavior - Make timechop plotting work with new behavior New issues that I do not plan to address in the forthcoming PR: - Incorporate multiple evaluation times into audition and/or postmodeling - Maybe users should be able to set a maximum evaluation horizon so that early models are not tested for, say, 100 time periods - Evaluation time-splitting could (or should) eventually not be done with pre-made matrices but on-the-fly atevaluation time
precision calculation is currently taking predictions for several as of dates, and calculating precision across all of them together, resulting in bogus results. need to look at how to do it for each as of date separately and then aggregate or something more reasonable.
The text was updated successfully, but these errors were encountered: