Experiment Validator [Resolves #26] #239

thcrock · 2017-10-26T19:56:34Z

Add ExperimentValidator class to hold all pre-run experiment validations, and populate with initial list of validations
Add database_reflection and validation_primitives modules (moved from architect) to support validations
Move experiment test utils to utils file so they are now usable in other tests
Update README with validation information

codecov-io · 2017-10-26T20:02:08Z

Codecov Report

Merging #239 into master will decrease coverage by 2.54%.
The diff coverage is 75.17%.

@@            Coverage Diff             @@
##           master     #239      +/-   ##
==========================================
- Coverage   80.67%   78.12%   -2.55%     
==========================================
  Files           5        8       +3     
  Lines         326      608     +282     
==========================================
+ Hits          263      475     +212     
- Misses         63      133      +70

Impacted Files	Coverage Δ
triage/experiments/base.py	`89.94% <25%> (-4.97%)`	⬇️
triage/experiments/validate.py	`74.29% <74.29%> (ø)`
triage/validation_primitives.py	`87.09% <87.09%> (ø)`
triage/database_reflection.py	`92% <92%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6cc0e10...ddbdfd3. Read the comment docs.

shaycrk · 2017-10-27T18:19:20Z

README.md

+ [SQL: 'explain select * from cat_complaints']
+```
+
+If nothing is output, it means that we can't find anything wrong with the Experiment so far, and it should be ready to run.


Just from a user-friendliness perspective, might actually be nice to output a message saying that no validation errors were found.

shaycrk · 2017-10-27T18:41:34Z

triage/experiments/validate.py

+        self.db_engine = db_engine
+
+
+class FeatureAggregationsValidator(Validator):


Is most of this just moving from architect.FeatureGenerator? If so, we should probably include the imputation validation here, too (and probably a reasonable time to go ahead and address this issue while we're at it). Also, is there an architect PR to remove from there as well?

Yep, lots of things in here are just moving from Architect. It's easy enough to move the imputation stuff from Architect.

The collate issue seems like it will be less needed post-merge.

Removing things from architect I figured we would do after this PR. Though now, with the code merge, I'm not sure if it's worth it. Maybe architect could keep its validations as it rides off into the sunset.

Makes sense on holding off on removing things from architect, though I think the collate issue will still be important even post-monolith in terms of making things a little bit more clear and easier to edit so you only have to do so in one place in the code (even if the monolith will at least mean you only have to edit one repo regardless).

shaycrk · 2017-10-27T18:44:17Z

triage/experiments/validate.py

+    def _validate_categoricals(self, categoricals):
+        conn = self.db_engine.connect()
+        for categorical in categoricals:
+            if 'choice_query' in categorical:


If choice_query isn't specified should we validate that choices is (and, likewise, that one and only one of those is given)?

shaycrk · 2017-10-27T18:52:20Z

triage/experiments/validate.py

+
+class StateConfigValidator(Validator):
+    def run(self, state_config):
+        if 'table_name' in state_config:


maybe warn that will be falling back to default behavior if table_name isn't specified?

shaycrk · 2017-10-27T18:56:32Z

triage/experiments/validate.py

+                raise ValueError('''Unknown feature_group_definition key {} received.
+                Available keys are {}'''.format(subsetter_name, available_subsetters))
+            if not hasattr(value, '__iter__') or isinstance(value, (str, bytes)):
+                raise ValueError('Each value in FeatureGroupCreator must be iterable and not a string')


not sure someone just using a triage config is going to know what to do with this error message.

shaycrk · 2017-10-27T18:58:33Z

triage/experiments/validate.py

+            if not hasattr(value, '__iter__') or isinstance(value, (str, bytes)):
+                raise ValueError('Each value in FeatureGroupCreator must be iterable and not a string')
+
+        if 'prefix' in feature_group_definition:


looks like we validate specified prefixes but not if you specify tables?

shaycrk · 2017-10-27T19:07:07Z

triage/experiments/validate.py

+
+
+class ModelGroupKeysValidator(Validator):
+    def run(self, model_group_keys, user_metadata):


I think there might be more we can do here around potential logical fallacies or places where the user might get unexpected results, at least with warnings. For instance, I can't think of a good reason not to have train_duration as part of the model group keys, while at the same time, I can't think of a case where having matrix_end_time in the keys would make sense.

This should be good, but this seems it might be better to actually bar them from doing certain things, or auto-adding certain things. If there isn't a good reason not to have certain keys, should we just add them to the defaults?

yeah, adding to defaults and disallowing certain things might make more sense than simply checking for those things on validation.

shaycrk · 2017-10-27T19:08:35Z

triage/experiments/validate.py

+                Available keys are {}'''.format(model_group_key, available_keys))
+
+
+class GridConfigValidator(Validator):


maybe at very least make sure that all of the model classes specified actually exist and run get_params() on them to ensure that the specified parameters exist?

shaycrk · 2017-10-27T19:09:24Z

triage/experiments/validate.py

+        pass
+
+
+class ScoringConfigValidator(Validator):


At least ensure that the specified metrics exist and are properly defined with a metric decorator to specify whether greater is better?

shaycrk · 2017-10-27T19:10:51Z

triage/experiments/validate.py

+
+
+class ExperimentValidator(Validator):
+    def run(self, experiment_config):


feels like we should probably check the config version here, too, right? though at least that will get caught pretty immediately when trying to run the experiment.

That actually happens at instantiation time, so you could never even get to attempting to run it; or even this validation code, under the recommended code path (experiment.validate()) . We could just remove it from there and put it here instead.

shaycrk · 2017-10-27T19:13:52Z

Went through here and added some inline comments. Generally looks good and I think this will definitely be a helpful addition. One thing that seems like it's missing, though, is a validation of the temporal config, which often feels like a bit source of logical errors, potential leakage problems, etc. And, given that timechop is generally fast to run, it might make sense to actually generate the time cuts and do some simple validation of the output (matrices with no as of dates? obvious leakage problems?)

jesteria · 2017-10-30T17:23:15Z

triage/database_reflection.py

+        db_engine.execute('select * from {} limit 1'.format(table_name))
+    ]
+
+    return len(results) > 0


Not a hugely important difference, but I believe you can safely load even less of the actual data here, if you just want to check that there are any rows in the table –

result = db_engine.execute( 'select 1 from {table_name} limit 1' .format(table_name=table_name) ) return any(result)

jesteria · 2017-10-30T17:39:57Z

triage/experiments/validate.py

+        logging.info('Validating time intervals')
+        for interval in intervals:
+            if interval != 'all':
+                _ = convert_str_to_relativedelta(interval)


just curious, why the assignment? for pdb?

jesteria · 2017-10-30T17:41:12Z

triage/experiments/validate.py

+                _ = convert_str_to_relativedelta(interval)
+
+    def _validate_groups(self, groups):
+        if not any(group == 'entity_id' for group in groups):


is this different from 'entity_id' in groups?

jesteria · 2017-10-30T19:04:22Z

triage/experiments/validate.py

+                    raise ValueError('''Aggregation prefix of '{}' was given as a
+                    feature group, but no such prefix exists in the available
+                    feature aggregations. The available feature aggregations are {}
+                    '''.format(prefix, available_prefixes))


another trivial suggestion, but if you're doing a bunch of [not] in checks, a set would usually be more performant.

and, I wonder if it might be nice to tell them up front all of the bad prefixes, along the lines of –

available_prefixes = {aggregation['prefix'] for aggregation in feature_aggregation_config} bad_prefixes = set(feature_group_definition['prefix']) - available_prefixes if bad_prefixes: raise ValueError(….format(bad_prefixes, available_prefixes))

jesteria · 2017-10-30T19:08:56Z

triage/experiments/validate.py

+        for strategy in feature_group_strategies:
+            if strategy not in available_strategies:
+                raise ValueError('''Unknown feature_group_strategies key {} received.
+                Available keys are {}'''.format(strategy, available_strategies))


thcrock · 2017-11-03T20:41:53Z

I added temporal config output validation, but it's very rudimentary. I'm not really sure what makes sense to put in there (what are obvious leakage problems)?

jesteria

👍

jesteria · 2017-11-06T21:45:10Z

triage/database_reflection.py

+    result = [
+        row for row in
+        db_engine.execute('select 1 from {} limit 1'.format(table_name))
+    ]


This still seems like a superfluous list comprehension – it neither maps nor filters; and, I gather that the SQLAlchemy result object is iterable, and sufficient, (and even if we had to cast to list we could do so explicitly, i.e. list(results)). …But, quite far from a big deal!

shaycrk · 2017-11-06T22:16:54Z

triage/experiments/validate.py

+
+        if 'tables' in feature_group_definition:
+            available_tables = {
+                aggregation['prefix'] + '_aggregation'


oh, actually with the imputation branch merged, this should be aggregation['prefix'] + '_aggregation_imputed'

that said, looks like we actually need to fix the example of this in the example experiment config here as well if you don't mind doing that along with this PR.

shaycrk · 2017-11-06T22:41:06Z

Thanks, @thcrock!

Three thoughts here:

Did you end up pulling in the imputation validation stuff from architect as well? If so, I think I missed it in the diff (e.g., from here and the methods that calls)
One simple leakage problem would be to take the last as_of_date from each training matrix, add the train_label_window and make sure it isn't later than the first as_of_date in any associated test matrix. That would at least catch the most obvious case (Timechop should prevent this from happening, but it's a big enough problem that a little redundancy in checking for it here as well is probably a good idea). We could probably come up with other checks as well, but this would be a reasonable starting point.
Without making the output too messy, I wonder if there's a nice way to spit out at least some info about the time splits that we've found since that would help the user validate that the logic they think they're representing in the time config is actually doing what they want/expect; there's not a simple assertion we can do there, of course, but would be helpful and this does seem like the appropriate place for it. Maybe just print out something like the first and last training as_of_times, first and last test as_of_times, some counts, and spit out the full object to a file or pickle that the user can load and inspect more if they'd like?

- Add ExperimentValidator class to hold all pre-run experiment validations, and populate with initial list of validations - Add database_reflection and validation_primitives modules (moved from architect) to support validations - Move experiment test utils to utils file so they are now usable in other tests - Update README with validation information

… indicate success

shaycrk · 2017-11-07T23:44:19Z

Thanks, @thcrock -- I think these generally look good to me and I'd be good with merging (of course, we can continue to add other validations as we go). Seems like pointing people to the split_definitions property is reasonable to surface the full set of time splits.

shaycrk reviewed Oct 27, 2017

View reviewed changes

jesteria reviewed Oct 30, 2017

View reviewed changes

thcrock force-pushed the experiment_validation branch from 011f047 to 44eb390 Compare November 3, 2017 20:14

jesteria approved these changes Nov 6, 2017

View reviewed changes

shaycrk reviewed Nov 6, 2017

View reviewed changes

thcrock added 5 commits November 7, 2017 13:35

Show success message instead of relying on the absence of messages to…

e9a9879

… indicate success

More validations based on review

8a5ccf1

Basic temporal validation

2782187

Change temporal validation to use new timechop names

2d0bef1

thcrock force-pushed the experiment_validation branch from baec991 to 2d0bef1 Compare November 7, 2017 19:41

thcrock added 2 commits November 7, 2017 14:22

Add feature imputation validations

a976074

Some time leakage validation

1561332

Time split summary

f4414e8

Dedent validation ValueErrors

ddbdfd3

thcrock merged commit 0c49dac into master Nov 8, 2017

thcrock deleted the experiment_validation branch November 8, 2017 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment Validator [Resolves #26] #239

Experiment Validator [Resolves #26] #239

thcrock commented Oct 26, 2017

codecov-io commented Oct 26, 2017 •

edited

Loading

shaycrk Oct 27, 2017

shaycrk Oct 27, 2017

thcrock Nov 3, 2017

shaycrk Nov 6, 2017

shaycrk Oct 27, 2017

shaycrk Oct 27, 2017

shaycrk Oct 27, 2017

shaycrk Oct 27, 2017

shaycrk Oct 27, 2017

thcrock Nov 3, 2017

shaycrk Nov 6, 2017

shaycrk Oct 27, 2017

shaycrk Oct 27, 2017

shaycrk Oct 27, 2017

thcrock Nov 3, 2017

shaycrk commented Oct 27, 2017

jesteria Oct 30, 2017

jesteria Oct 30, 2017

jesteria Oct 30, 2017

jesteria Oct 30, 2017

jesteria Oct 30, 2017

thcrock commented Nov 3, 2017

jesteria left a comment

jesteria Nov 6, 2017

shaycrk Nov 6, 2017

shaycrk commented Nov 6, 2017

shaycrk commented Nov 7, 2017

		self.db_engine = db_engine


		class FeatureAggregationsValidator(Validator):



		class ModelGroupKeysValidator(Validator):
		def run(self, model_group_keys, user_metadata):

		Available keys are {}'''.format(model_group_key, available_keys))


		class GridConfigValidator(Validator):



		class ExperimentValidator(Validator):
		def run(self, experiment_config):

Experiment Validator [Resolves #26] #239

Experiment Validator [Resolves #26] #239

Conversation

thcrock commented Oct 26, 2017

codecov-io commented Oct 26, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shaycrk commented Oct 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thcrock commented Nov 3, 2017

jesteria left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shaycrk commented Nov 6, 2017

shaycrk commented Nov 7, 2017

codecov-io commented Oct 26, 2017 •

edited

Loading