-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactored datasets code create CSV/tfrecords concurrently (relates to …
…#79) Big change Previously creating a dataset was a two step process; 1. generate full set of instances with params for each and save to CSV file(s) 2. make the corresponding tfrecord(s) from the CSVs, using the params to create the model LC upon which the mags feature is based. This is also where train/val/test split happens. With this change, both the csvs and tfrecords are written concurrently. The benefit being that we now assemble all of the data for each instance (including mags feature) before saving to both csv and tfrecord, meaning we can add logic which may discard or modify the instance based on the outcome of generating the mags feature. Previously, membership and labels were decided at the CSV stage and these could not be updated when generating the mags data. Because of changes (fixes) to the random generators/seeds datasets generated from here will not have same members as previously. However, consistency is now improved going forward. Fringe benefit; the new code uses less RAM and is slightly faster than the previous approach.
- Loading branch information
Showing
3 changed files
with
244 additions
and
307 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.