-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework trainset/dataset code so that it can flip the components/roll mags if secondary is deeper #79
Comments
First job is to refactor the code for generating the trainsets & datasets. Currently, this is split into two distinct steps; generating the trainset CSV files then using the contents of the CSVs to generate the mags/LCs and write the corresponding dataset files. We'll need to rework the code (mainly what's in datasets) to do both concurrently. This will make it possible to amend an instance's params based on the results of generating its LC model. |
…#79) Big change Previously creating a dataset was a two step process; 1. generate full set of instances with params for each and save to CSV file(s) 2. make the corresponding tfrecord(s) from the CSVs, using the params to create the model LC upon which the mags feature is based. This is also where train/val/test split happens. With this change, both the csvs and tfrecords are written concurrently. The benefit being that we now assemble all of the data for each instance (including mags feature) before saving to both csv and tfrecord, meaning we can add logic which may discard or modify the instance based on the outcome of generating the mags feature. Previously, membership and labels were decided at the CSV stage and these could not be updated when generating the mags data. Because of changes (fixes) to the random generators/seeds datasets generated from here will not have same members as previously. However, consistency is now improved going forward. Fringe benefit; the new code uses less RAM and is slightly faster than the previous approach.
The instance generator functions no longer yield from a for loop based on an instance_count argument. Instead they will yield indefinitely, with datasets.make_dataset_file() closing the generator once it has the required number of usable instance. This will allow make_dataset_file() to discard instances which cause an error or fail to meet some future criterion.
Since the refactoring these are no longer true.
Previously the plot_trainset_histogram() function was hard coded to look for trainset*.csv files. Its replacement now takes an iterable of files, so it's up to the caller to glob the appropriate files (which can now be called whatever is wanted). Took the opportunity to further revise this to return a fig as the other plot functions in this module do.
The check for usable instances is now handled as a callback from make_dataset_file() to a function in the make_* modules. Previously this effectively the other way round; a function in datasets callled by the modules. This change will allow the criteria to be specific to the dataset and for the check to be used within the logic of the controlling make_dataset_file() function.
…s to #79) Added logic which will update the params dict to swap the primary/secondary stars over and will roll the secondary eclipse to phase zero if the original secondary eclipse is found to be deeper than the primary. Controlling flag defaults to False so that existing behaviour is unchanged.
Closing this issue and raised #80 to cover the investigation of the above. |
Following on from investigation in #68
I suspect this will involve generating the csv and tfrecord files together, so that the csv can be kept in synch if we have to flip the components to match a revised mag feature.
Will also need to revise training set distributions as initial investigations show that this change will lead to a significantly wider range on k.
Some noddy, samply code from initial investigation; snippet-roll-LC-to-switch.txt](https://github.com/user-attachments/files/16446430/snippet-roll-LC-to-switch.txt)
The text was updated successfully, but these errors were encountered: