Releases: dssg/triage
Testing 3.9 compatibility
v5.0.1-alpha1 Bump version: 4.4.0 → 5.0.0
Dried Apricot
WARNING: BREAKING CHANGES!
Note that several changes in triage 5 break backwards compatibility with triage 4. If you are upgrading a project from an earlier version of triage, it is highly recommended that you first create a backup of your current database!
These breaking changes include:
- Revision in the way the
model_hash
is calculated means that if you're re-running an experiment from an earlier version of triage, it will re-train your models and give them newmodel_id
s even if the configuration hasn't changed. - The
built_by_experiment
column has been removed fromtriage_metadata.models
in preference of tracking the specific run that built the model. Theexperiment_hash
can still be obtained by joining totriage_metadata.triage_runs
(néetriage_metadata.experiment_runs
). Should you need the data that was in this column at the time of migration, it can be found intriage_metadata.deprecated_models_built_by_experiment
, but it will not be restored to the table upon database downgrade. - Changes in the structure of matrix metadata means the
matrix_hash
will no longer be backwards-compatible with oder version of triage (as with models, re-running an old config would result in matrices being re-created) - The
random_seed
column has been removed fromtriage_metadata.experiments
in preference of tracking it at the run level as well. A database upgrade followed by a downgrade would lose this data (but could be recovered from the runs table)
New Functionality
- Functionality for predicting forward, either with an existing model object or by retraining a new model with the most current data given a
model_group_id
(#631) - Utility for adding predictions to models previously trained/tested with
save_predictions=False
(#836) - Provisioner for easily setting up a postgresql database (via docker) that can be used with triage (#840)
- More flexibility in parallelization for more resource-intensive model types, like random forests (#853)
Bug Fixes
- Ensure model-level random seeds are re-used when the config and experiment-level random seed are unchanged (#848)
- Remove the project path from the
model_hash
definition: themodel_id
shouldn't depend on wheretriage
is being run (#830) - Ensure that feature groups are sorted in matrix metadata for consistency in downstream calculations (#833)
Thanks To
AROY-D: The Second Box
Primarily a bugfix release for anyone working on triage 4. New functionality will be introduced with the 5.0 release.
Bug Fixes
- Fix functionality of bias analysis using
aequitas
during experiment runs. Previously the attributes for bias analysis were getting scrambled relative to the scores and labels when the latter get sorted for "best case" and "worst case" analyses, invalidating any results produced by these analyses. This release fixes this bug, ensure the same set of entities is provided for attributes and labels/scores, and adds a unit test to cover this issue. (#858) - Close database sessions during unit tests to avoid intermittent exceptions during test cleanup. (#851)
AROY-D
New Functionality
- Added connector for
aequitas
visualizations (#837) - Allow user-specified model grids to extend presets (#843)
- Audition improvements, including baseline models and stable color schemes (#844)
Bug Fixes
- Fixed building triage in docker container for dirty duck tutorial (#818, #820)
- Improve audition's handling of multiple models with different random seeds (#823)
Refactoring/Documentation
El "Patched" Paisano
Patched due some inconsistencies between catwalk and the newest version of sklearn
El Paisano
What is in this release?
- Now the schema is called
triage_metadata
instead ofmodel_metadata
(issue #700) - Replace flag now is passed to ModelTrainerTester (issue #784)
- New folder structure for
dirtyduck
- New folder structure for triage in a docker
- Fix an inconsistency in the command line option of the tutorial
- Python version as columns in experiment run (issue #742)
- Incorrect columns in
individual_importances
(issue #744) - Updated deprecated method calls (issues #734 and #754)
- Long standing issue with parsedatetime resolved (issue #721 )
- Several issues with dirtyduck solved (issues #750 #735 #736 #781)
Thanks to
- @thcrock , @shaycrk , @adunmore , @nanounanue
Chengdu
New Functionality:
- Evaluate on subsets [Resolves #535, #138] (#552)
- Implement train/testing priority [Resolves #542] (#581)
- Introduce experiment_runs table, beef up experiments table (#637)
- Dirty duck (the whole enchilada) (#670)
- Add compute best/worst/stochastic for each evaluation [Resolves #292] (#674)
- Insert Ranks for Predictions [Resolves #357] (#671)
- Support Python 3.7 [Resolves #683] (#684)
- Bias Part 1: Protected groups generator (#680)
- Bias part 2 (#688)
- Added DummyClassifier to the SimpleClassifiers batch (#702)
Bug Fixes:
- config is a str, not a fd (#610)
- Keep PyYAML pinned as v5 breaks our usage (#615)
- Fix cohort in unit tests, remove old code, squash some warnings (#621)
- Fix logging of which matrix was saved (#623)
- Harden postmodeling against lack of predictions [Resolves #638] (#645)
- Validate distinct feature group prefixes (#634)
- fix imports in example postmodeling notebook (#646)
- Fixed Audition's docs (#665)
- MS Triage (#666)
- Fixed broken links (#675)
- Fix Travis deploy [Resolves #493] (#677)
- Fix logging typos that only show up when splits are empty (#685)
- Fixes Postmodeling Weird Error [Resolves #691] (#693)
- Don't auto-upgrade db for new Experiments [Resolves #695] (#698)
- Check for capital letters with validator [Closes #632] (#701)
- check for empty protected_df (#709)
- Fixing dirtyduck (#720)
- Update MANIFEST.in (#723)
General Improvements:
- Read database connections from process environment (#605)
- Scheduled monthly dependency update for March (#619)
- Use compressed CSVs [Resolves #498] (#626)
- Faster train/test task generation (#628)
- Remove support for entity-only matrix indices [Resolves #477] (#622)
-Enable dburl env var in results_schema CLI [Resolves #636] (#639) - Run validation by default [Closes #635] (#642)
- Add feature_importance metric to SLR (#587)
- Scheduled monthly dependency update for April (#664)
- Remove redundant imputation flag columns [Resolves #544] (#676)
- write 5+ GiB (matrices) to S3Store (#687)
- Add more user database management options to CLI [Resolves #697] (#699)
- Scheduled monthly dependency update for May (#679)
- Kit and adolfos amazing adventure (aka experiment config defaults) Closes #717 (#719)
Refactoring/Documentation:
Arepa
New functionality:
- Postmodel Analysis (#482)
- Stores Timechop image to disk (#590)
- Add matrix uuid to evaluations tables [Resolves #591] (#593)
- Experiment Profiling [Resolves #557] (#558)
Bug fixes:
- Postmodel fixes (#604)
- Fixes #598 (#600)
- Series equality operator [Resolves #563] (#564)
- Fix MatrixStore memory leak [Resolves #594]
- Fix empty/columns check on HDFStore [Resolves #589] (#592)
- Fix upgrade_db to use filehandle [Resolves #572]
- Fix FromObj.maybe_materialize [Resolves #565] (#566)
- support 5 GB multipart upload threshold via S3Fs (#546)
General Improvements:
- Scheduled monthly dependency update for February (#588)
- Namespace cohort and labels tables by their config [Resolves #574] (#576)
- Only Build Features for Cohort [Resolves #513] (#567)
- Colocate Testing with Training [Resolves #560] (#569)
- Upgrade PyYAML to current security-patched release
- Skip Prediction Saving [Resolves #559]
- Scheduled monthly dependency update for January (#562)
- Materialize Subquery From Objects [Resolves #554] (#555)
- Skip already-evaluated models [Resolves #540] (#541)
- Throw warning if unscaled logit is used [Resolves #508] (#548)
- support in
develop
script for detection of pyenv installed via Homebrew - upgrade install-cli to better support non-GNU (MacOS)
- Cohort Generation respects replace flag [Resolves #503]
Refactoring/Documentation:
- Add Audition, Postmodeling, Dirty Duck references to docs (#599)
- audition_config file
- Audition config correct (#601)
- Experiment Architecture Doc [Resolves #579] (#580)
- docs: make proper list of experiment upgrading links
- Cohort and Label Deep Dive [Resolves #492] (#577)
- Disable individual importance in example experiment config (#568)
- Tweak language in running document
Flaming Hot Cheeto
New functionality:
- Add additional feature group CV strategy (all-combinations) (#518)
- Downcasting feature tables (#510)
- Label Generation Replace Flag [Resolves #499]
- Audition model group filter [Resovles #494] (#495)
- development environment wizard (#511)
Bug Fixes:
- Fix db engine check in Experiment [Resolves #538] (#539)
- Allow >5GB matrices with S3 [Resolves #530]
- refined test query to avoid unwarranted failure
- Prevent experiment hanging when worker is killed by OS [Resolves #501] (#506)
- develop script should install triage with the rq extra (#521)
General Improvements:
- Scheduled monthly dependency update for December (#526)
- Shorten log lines [Resolves #528]
- Verbose config check (#483)
- added pytest fixtures to simplify and clean up (architect) tests (#522)
- Add HDF5 to CLI and doc [Resolves #496] (#497)
Refactoring/Documentation:
Tim Tam
Flip featuretest CLI arguments to match the doc [Resolves #486] (#487)
Scheduled monthly dependency update for November (#485)
Associate Experiment with all models and matrices [Resolves #411] (#476)
Clean up Session Closing in Predictor [Resolves #478]
Downcast matrices [Resolves #372]
Update Contribution Guide [Resolves #425]
Initial run of Black for code formatting