Metric calculation is bogus #223

nanounanue · 2017-09-19T01:57:25Z

precision calculation is currently taking predictions for several as of dates, and calculating precision across all of them together, resulting in bogus results. need to look at how to do it for each as of date separately and then aggregate or something more reasonable.

thcrock · 2018-01-19T21:39:48Z

Not actionable as written. Closing, can reopen with more details if needed.

nanounanue · 2019-02-06T18:32:59Z

Given the following temporal configuration:

temporal_config:
    feature_start_time: '2010-01-04'
    feature_end_time: '2019-01-01'
    label_start_time: '2015-02-01'
    label_end_time: '2019-01-01'

    model_update_frequency: '1y'
    training_label_timespans: ['1month']
    training_as_of_date_frequencies: '1month'

    test_durations: '1y'
    test_label_timespans: ['1month']
    test_as_of_date_frequencies: '1month'

Resulting in the following temporal configuration:

As you can see, we will realize 12 different predictions in the test using the train model.

Should we get 12 different metric calculations? An array? Just the total one?

ecsalomon · 2019-02-07T04:40:39Z

My feeling on this is that there should be a different set of parameters in your temporal config, test_frequency and test_interval or somesuch that determines how many and which test matrices your model is evaluated on, and the test_duration and test_example_frequency are for how many and which dates to perform a single evaluation on (whether combining all of the dates in the way currently done makes sense is, I think, debatable). When we initially wrote the test_duration and test_example_frequency keys, we were thinking of cases where test predictions are also event-based, so each date may be sparsely labeled and combining multiple dates is necessary.

I feel like there are already issues to this effect somewhere.

ecsalomon · 2019-02-07T04:42:58Z

Ah, yes, I said the same thing in #378. Doesn't make me right, just consistent. :)

ecsalomon · 2019-02-07T16:41:20Z

Another thought on this: We are doing evaluations the same way (making one evaluation over all dates) in both test and train. For EWS problems, presumably, this method is equally bogus in both train and test. Should there be a flag to control this behavior?

This commit addresses #663, #378, #223 by allowing a model to be evaluated multiple times and thereby allowing users to see whether performance of single trained model degrades over the time following training. Users must now set a timechop parameter, `test_evaluation_frequency` that will add multiple test matrices to a time split. A model will be tested once on each matrix in its list. Matrices are added until they reach the label time limit, testing all models on the final test period (assuming that you make model_update_frequency evenly dividable by test_evaluation_frequency). This initial commit only makes changes to timechop proper. Remaining work includes: - Write tests for the new behavior - Make timechop plotting work with new behavior New issues that I do not plan to address in the forthcoming PR: - Incorporate multiple evaluation times into audition and/or postmodeling - Maybe users should be able to set a maximum evaluation horizon so that early models are not tested for, say, 100 time periods - Evaluation time-splitting could (or should) eventually not be done with pre-made matrices but on-the-fly atevaluation time

nanounanue added the bug label Sep 19, 2017

thcrock closed this as completed Jan 19, 2018

thcrock reopened this Jan 19, 2018

thcrock removed the bug label Feb 1, 2018

rayidghani added the needs-discussion label Jan 22, 2019

nanounanue changed the title ~~Precision calculation is bogus~~ Metric calculation is bogus Feb 6, 2019

ecsalomon mentioned this issue Mar 30, 2019

Get an evaluation per as_of_date #663

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric calculation is bogus #223

Metric calculation is bogus #223

nanounanue commented Sep 19, 2017 •

edited by thcrock

Loading

thcrock commented Jan 19, 2018

nanounanue commented Feb 6, 2019

ecsalomon commented Feb 7, 2019 •

edited

Loading

ecsalomon commented Feb 7, 2019

ecsalomon commented Feb 7, 2019

Metric calculation is bogus #223

Metric calculation is bogus #223

Comments

nanounanue commented Sep 19, 2017 • edited by thcrock Loading

thcrock commented Jan 19, 2018

nanounanue commented Feb 6, 2019

ecsalomon commented Feb 7, 2019 • edited Loading

ecsalomon commented Feb 7, 2019

ecsalomon commented Feb 7, 2019

nanounanue commented Sep 19, 2017 •

edited by thcrock

Loading

ecsalomon commented Feb 7, 2019 •

edited

Loading