-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected behaviour with order of columns in IntercomparisonMetrics #220
Comments
I introduced a new keyword argument During testing I stumbled on a related problem: If I use a metrics calculator with
Then the results dictionary has these keys:
that is, the tuples are sorted alphabetically. When using the new
In one case the reference is first, in the other case the reference is second, which is pretty unintuitive as a user. In this case I would expect that it's always (other, ref), or (ref, other), and not depend on the name of the datasets. The temporal matcher returns the matched pairs with keys (other, ref), which I was expecting to get. I know that pytesmo by default does all combinations, and in this case a general rule is harder. But I think pytesmo should respect the order that was given by the temporal matcher. Furthermore, since dictionary orders are now reliable (since CPython 3.6, and since Python 3.7 in the standard), the default matcher should probably also respect the order of datasets in the DataManager/datasets dictionary, if it doesn't do this already. |
Addendum: This reordering is especially a problem when calculating bias, because the sign is then different in these two cases, and simply using So I think this needs to be fixed when we want to incorporate the new PairwiseIntercomparisonMetrics. @wpreimes do you have an idea how far reaching those changes would be? Is this order being relied upon a lot in pytesmo? |
* temporal matching for multiple frames * don't convert timezone naive indices * combined matcher for dictionary * added option to use all with combined_dropna * comparison of new and old matching implementation This adds a test to check whether the new matching in BasicTemporalMatching works as the old one in case there are no duplicates in the reference frame. It replaced the matcher in the other tests by a dummy that still uses the old matching implementation. The following commit will remove the dummy and set the new expected values. * changed tests for ascat_ismn to use new matching * fixup for previous commit * added PairwiseIntercomparisonMetrics * fixed test for PairwiseIntercomparisonMetrics * statistics based metrics calculator * Added tests for PairwiseIntercomparisonMetrics One test still fails due to difference in scaling before calculating ubRMSD. * Added CIs to PairwiseIntercomparisonMetrics * made bootstrapped CI tests less sensitive * made bootstrapped CI tests less sensitive * added missing import * temporal matching for multiple frames * don't convert timezone naive indices * added option to use all with combined_dropna * combined matcher for dictionary * comparison of new and old matching implementation This adds a test to check whether the new matching in BasicTemporalMatching works as the old one in case there are no duplicates in the reference frame. It replaced the matcher in the other tests by a dummy that still uses the old matching implementation. The following commit will remove the dummy and set the new expected values. * changed tests for ascat_ismn to use new matching * fixup for previous commit * added PairwiseIntercomparisonMetrics * fixed test for PairwiseIntercomparisonMetrics * statistics based metrics calculator * Added tests for PairwiseIntercomparisonMetrics One test still fails due to difference in scaling before calculating ubRMSD. * Added CIs to PairwiseIntercomparisonMetrics * made bootstrapped CI tests less sensitive * made bootstrapped CI tests less sensitive * added missing import * make_combined_temporal_matcher now also works with k>2 * Fixed some issues - bug in validation.get_data_for_result_tuple - no segfault when running tests anymore * better confidence intervals for tcol * added tests for tripletmetrics * added imports * run tests all at the same time * removed imports from validation_framework init * triple metrics only * removed xarray test * fixed test * fixed test * TripleIntercomparisonMetrics -> TripleCollocationMetrics * use percentile method by default; analytical CIs for mse_bias * better test for #220
When setting up a test for the new
PairwiseIntercomparisonMetrics
(#219) I stumbled on some (for me) unexpected behaviour.I used a testcase with 4 datasets, and apparently the order of the datasets changed, leading to wrong results (see example below).
The issue is caused by different orders of the dataset names passed to IntercomparisonMetrics. It works when
dataset_names
passed to IntercomparisonMetrics comes fromget_dataset_names
using a DataManager. If one just passes the names of the datasets in a different oder, the results are wrong. However, in the former case the wrong column (c1) is used as reference.I believe the problem is that pytesmo automatically converts a dataset dictionary to a Data Manager (
pytesmo/src/pytesmo/validation_framework/validation.py
Line 119 in e8cbda4
Based on this order,
perform_validation
then renames the columns of the DataFrame toref
,k1
,k2
, andk3
(pytesmo/src/pytesmo/validation_framework/validation.py
Line 300 in e8cbda4
dataset_names
was different, the columns are not in the order they should be.I think this should probably be fixed in Validation, since only validation knows which ones are reference and which ones are non-reference datasets.
@wpreimes, any thoughts on this?
366 rows × 4 columns
Bias between
c1
andref
is 0.2.The bias should be 0.2 here.
When using
get_dataset_names
instead oflist(datasets.keys())
it works correctly:The text was updated successfully, but these errors were encountered: