Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment Validator [Resolves #26] #239

Merged
merged 9 commits into from
Nov 8, 2017
Merged

Conversation

thcrock
Copy link
Contributor

@thcrock thcrock commented Oct 26, 2017

  • Add ExperimentValidator class to hold all pre-run experiment validations, and populate with initial list of validations
  • Add database_reflection and validation_primitives modules (moved from architect) to support validations
  • Move experiment test utils to utils file so they are now usable in other tests
  • Update README with validation information

@codecov-io
Copy link

codecov-io commented Oct 26, 2017

Codecov Report

Merging #239 into master will decrease coverage by 2.54%.
The diff coverage is 75.17%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #239      +/-   ##
==========================================
- Coverage   80.67%   78.12%   -2.55%     
==========================================
  Files           5        8       +3     
  Lines         326      608     +282     
==========================================
+ Hits          263      475     +212     
- Misses         63      133      +70
Impacted Files Coverage Δ
triage/experiments/base.py 89.94% <25%> (-4.97%) ⬇️
triage/experiments/validate.py 74.29% <74.29%> (ø)
triage/validation_primitives.py 87.09% <87.09%> (ø)
triage/database_reflection.py 92% <92%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6cc0e10...ddbdfd3. Read the comment docs.

README.md Outdated
[SQL: 'explain select * from cat_complaints']
```

If nothing is output, it means that we can't find anything wrong with the Experiment so far, and it should be ready to run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just from a user-friendliness perspective, might actually be nice to output a message saying that no validation errors were found.

self.db_engine = db_engine


class FeatureAggregationsValidator(Validator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is most of this just moving from architect.FeatureGenerator? If so, we should probably include the imputation validation here, too (and probably a reasonable time to go ahead and address this issue while we're at it). Also, is there an architect PR to remove from there as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, lots of things in here are just moving from Architect. It's easy enough to move the imputation stuff from Architect.

The collate issue seems like it will be less needed post-merge.

Removing things from architect I figured we would do after this PR. Though now, with the code merge, I'm not sure if it's worth it. Maybe architect could keep its validations as it rides off into the sunset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense on holding off on removing things from architect, though I think the collate issue will still be important even post-monolith in terms of making things a little bit more clear and easier to edit so you only have to do so in one place in the code (even if the monolith will at least mean you only have to edit one repo regardless).

def _validate_categoricals(self, categoricals):
conn = self.db_engine.connect()
for categorical in categoricals:
if 'choice_query' in categorical:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If choice_query isn't specified should we validate that choices is (and, likewise, that one and only one of those is given)?


class StateConfigValidator(Validator):
def run(self, state_config):
if 'table_name' in state_config:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe warn that will be falling back to default behavior if table_name isn't specified?

raise ValueError('''Unknown feature_group_definition key {} received.
Available keys are {}'''.format(subsetter_name, available_subsetters))
if not hasattr(value, '__iter__') or isinstance(value, (str, bytes)):
raise ValueError('Each value in FeatureGroupCreator must be iterable and not a string')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure someone just using a triage config is going to know what to do with this error message.

if not hasattr(value, '__iter__') or isinstance(value, (str, bytes)):
raise ValueError('Each value in FeatureGroupCreator must be iterable and not a string')

if 'prefix' in feature_group_definition:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we validate specified prefixes but not if you specify tables?



class ModelGroupKeysValidator(Validator):
def run(self, model_group_keys, user_metadata):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be more we can do here around potential logical fallacies or places where the user might get unexpected results, at least with warnings. For instance, I can't think of a good reason not to have train_duration as part of the model group keys, while at the same time, I can't think of a case where having matrix_end_time in the keys would make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be good, but this seems it might be better to actually bar them from doing certain things, or auto-adding certain things. If there isn't a good reason not to have certain keys, should we just add them to the defaults?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, adding to defaults and disallowing certain things might make more sense than simply checking for those things on validation.

Available keys are {}'''.format(model_group_key, available_keys))


class GridConfigValidator(Validator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe at very least make sure that all of the model classes specified actually exist and run get_params() on them to ensure that the specified parameters exist?

pass


class ScoringConfigValidator(Validator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least ensure that the specified metrics exist and are properly defined with a metric decorator to specify whether greater is better?



class ExperimentValidator(Validator):
def run(self, experiment_config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like we should probably check the config version here, too, right? though at least that will get caught pretty immediately when trying to run the experiment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That actually happens at instantiation time, so you could never even get to attempting to run it; or even this validation code, under the recommended code path (experiment.validate()) . We could just remove it from there and put it here instead.

@shaycrk
Copy link
Contributor

shaycrk commented Oct 27, 2017

Went through here and added some inline comments. Generally looks good and I think this will definitely be a helpful addition. One thing that seems like it's missing, though, is a validation of the temporal config, which often feels like a bit source of logical errors, potential leakage problems, etc. And, given that timechop is generally fast to run, it might make sense to actually generate the time cuts and do some simple validation of the output (matrices with no as of dates? obvious leakage problems?)

db_engine.execute('select * from {} limit 1'.format(table_name))
]

return len(results) > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a hugely important difference, but I believe you can safely load even less of the actual data here, if you just want to check that there are any rows in the table –

result = db_engine.execute(
    'select 1 from {table_name} limit 1'
    .format(table_name=table_name)
)
return any(result)

logging.info('Validating time intervals')
for interval in intervals:
if interval != 'all':
_ = convert_str_to_relativedelta(interval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, why the assignment? for pdb?

_ = convert_str_to_relativedelta(interval)

def _validate_groups(self, groups):
if not any(group == 'entity_id' for group in groups):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this different from 'entity_id' in groups?

raise ValueError('''Aggregation prefix of '{}' was given as a
feature group, but no such prefix exists in the available
feature aggregations. The available feature aggregations are {}
'''.format(prefix, available_prefixes))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another trivial suggestion, but if you're doing a bunch of [not] in checks, a set would usually be more performant.

and, I wonder if it might be nice to tell them up front all of the bad prefixes, along the lines of –

available_prefixes = {aggregation['prefix'] for aggregation in feature_aggregation_config}
bad_prefixes = set(feature_group_definition['prefix']) - available_prefixes
if bad_prefixes:
    raise ValueError(….format(bad_prefixes, available_prefixes))

for strategy in feature_group_strategies:
if strategy not in available_strategies:
raise ValueError('''Unknown feature_group_strategies key {} received.
Available keys are {}'''.format(strategy, available_strategies))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ditto)

@thcrock thcrock force-pushed the experiment_validation branch from 011f047 to 44eb390 Compare November 3, 2017 20:14
@thcrock
Copy link
Contributor Author

thcrock commented Nov 3, 2017

I added temporal config output validation, but it's very rudimentary. I'm not really sure what makes sense to put in there (what are obvious leakage problems)?

Copy link
Member

@jesteria jesteria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

result = [
row for row in
db_engine.execute('select 1 from {} limit 1'.format(table_name))
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems like a superfluous list comprehension – it neither maps nor filters; and, I gather that the SQLAlchemy result object is iterable, and sufficient, (and even if we had to cast to list we could do so explicitly, i.e. list(results)). …But, quite far from a big deal!


if 'tables' in feature_group_definition:
available_tables = {
aggregation['prefix'] + '_aggregation'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, actually with the imputation branch merged, this should be aggregation['prefix'] + '_aggregation_imputed'

that said, looks like we actually need to fix the example of this in the example experiment config here as well if you don't mind doing that along with this PR.

@shaycrk
Copy link
Contributor

shaycrk commented Nov 6, 2017

Thanks, @thcrock!

Three thoughts here:

  1. Did you end up pulling in the imputation validation stuff from architect as well? If so, I think I missed it in the diff (e.g., from here and the methods that calls)
  2. One simple leakage problem would be to take the last as_of_date from each training matrix, add the train_label_window and make sure it isn't later than the first as_of_date in any associated test matrix. That would at least catch the most obvious case (Timechop should prevent this from happening, but it's a big enough problem that a little redundancy in checking for it here as well is probably a good idea). We could probably come up with other checks as well, but this would be a reasonable starting point.
  3. Without making the output too messy, I wonder if there's a nice way to spit out at least some info about the time splits that we've found since that would help the user validate that the logic they think they're representing in the time config is actually doing what they want/expect; there's not a simple assertion we can do there, of course, but would be helpful and this does seem like the appropriate place for it. Maybe just print out something like the first and last training as_of_times, first and last test as_of_times, some counts, and spit out the full object to a file or pickle that the user can load and inspect more if they'd like?

- Add ExperimentValidator class to hold all pre-run experiment validations, and populate with initial list of validations
- Add database_reflection and validation_primitives modules (moved from architect) to support validations
- Move experiment test utils to utils file so they are now usable in other tests
- Update README with validation information
@thcrock thcrock force-pushed the experiment_validation branch from baec991 to 2d0bef1 Compare November 7, 2017 19:41
@shaycrk
Copy link
Contributor

shaycrk commented Nov 7, 2017

Thanks, @thcrock -- I think these generally look good to me and I'd be good with merging (of course, we can continue to add other validations as we go). Seems like pointing people to the split_definitions property is reasonable to surface the full set of time splits.

@thcrock thcrock merged commit 0c49dac into master Nov 8, 2017
@thcrock thcrock deleted the experiment_validation branch November 8, 2017 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants