Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes and updates in features (SPM, TDE, ulens) #492

Merged
merged 24 commits into from
Nov 6, 2024

Conversation

ale-munozarancibia
Copy link
Contributor

Summary of the changes

This PR:

  • Solves problems in the fits computed by ulens and TDE extractors using scipy.optimize.curve_fit, that arose from using float32 precision in jax.numpy instead of float64. This new precision is also used in SPM now, for consistency. IMPORTANT: need to check that the fix (i.e. setting jax_enable_x64 to True in these extractors) works in the pipeline and in the machine(s) used for the training set, as jax documentation states that "XLA doesn’t support 64-bit convolutions on all backends" (see https://jax.readthedocs.io/en/latest/notebooks/Common_Gotchas_in_JAX.html#double-64bit-precision).

  • Adds writing features from some parametric models that were missing, in order to plot such fits along with the data (by users or when debugging code). These features are: for TDETailExtractor, TDE_mag_0; for FleetExtractor, fleet_m_0 and fleet_t0; for MicroLensExtractor, ulens_t0 and ulens_mag_0. IMPORTANT: a list of these features must be added to the classifiers that use features, in order to avoid their inclusion.

  • Fixes the artificially low chi values in SPM, that arose from a factor that was too big compared to model uncertainties. Such factor was added to a denominator, preventing extremely big values.

  • Fixes the color_variation feature, so that it now uses difference fluxes instead of corrected ones. Since current models that use this feature are in beta mode, I do not think we need to change the name of the feature, as previous (i.e. wrong) features versions should be erased from the database once the models become stable.

  • Changes the time window size used for the color_variation feature, to meet the value used by Manuel Pavez' paper.

  • Fixes the equation for A in the ulens model (an incorrect value of 3 is now 4).

  • Expands the allowed intervals for the ulens model parameters.

About this PR:

  • You added the necessary documentation for the team to understand your work.
  • You linked this PR to the corresponding issue or user story.
  • Your changes fulfill the Definition of Done of the issue or user story.
  • Your changes were tested with manual testing before being submitted.
  • Your changes were tested with automatic unit tests.
  • Your changes were tested with automatic integration tests.

@ignacioreyes ignacioreyes self-requested a review October 15, 2024 15:19
@ale-munozarancibia
Copy link
Contributor Author

Updates:

  • Added new features:
    -- For PanStarrsFeatureExtractor: ps_r-i and ps_i-z (ps_r_i and ps_i_z in pipeline)
    -- For MHPSExtractor: MHPS_high_30, MHPS_low_365, MHPS_ratio_365_30. Note that we do not save MHPS_non_zero_365_30 nor MHPS_PN_flag_365_30, and that MHPS features that use 10 and 100 day timescales keep the same names as before for backward compatibility (MHPS_high, MHPS_low, MHPS_non_zero and MHPS_PN_flag)
    -- Added new extractor ReferenceFeatureExtractor, which computes features mean_distnr, sigma_distnr, mean_sharpnr and mean_chinr

  • Changed some feature names:
    -- For PanStarrsFeatureExtractor: from dist_nr to distpsnr1
    -- For TDETailExtractor: from TDE_mag_0 to TDE_mag0
    -- For FleetExtractor: from fleet_m_0 to fleet_m0
    -- For MicroLensExtractor: from ulens_mag_0 to ulens_mag0

  • Changed the light curve time range and logarithm argument used for the TDE tail features, to meet the expressions used by Manuel Pavez' paper.

  • Changed the AstroObject class, so that it now expects columns distnr and rfid in the detections dataframe, and can receive as input a dataframe with reference table data. These changes are needed to compute features mean_distnr, sigma_distnr, mean_sharpnr and mean_chinr. The functions create_astro_object in pipeline/lc_classifier/lc_classifier/utils.py and pipeline/training/lc_classifier_ztf/feature_computation/dataset.py are now updated accordingly.

IMPORTANT: more changes are needed before running in pipeline

@ignacioreyes
Copy link
Member

@ale-munozarancibia Go to the lc_classifier directory and run python -m pytest tests. Then fix the broken code or broken tests.

@ale-munozarancibia
Copy link
Contributor Author

ale-munozarancibia commented Oct 18, 2024

Updates:

  • Added missing PS1 metadata to create_astro_object function in pipeline/lc_classifier/lc_classifier/utils.py.
  • Removed modification to astro_object.reference in output of feature computation.
  • Removed distnr and rfid from mandatory columns in light curve epochs.
  • Updated the example file used in get_ztf_forced_phot_cepheid function (used by some tests) to include columns distnr and rfid in detections.
  • Added a column "mjd_nohelio" to the astro_object light curve updated by the ZTFLightcurvePreprocessor class, as column "mjd" is updated to include barycentric corrections (see _helio_time_correction function and https://docs.astropy.org/en/stable/api/astropy.coordinates.get_body_barycentric.html). Strictly speaking, columns "mjd" and "mjd_nohelio" in the astro_object light curve after applying the preprocessing correspond to "hmjd" and the original "mjd" respectively. Same applies for the astro_object updated by the ZTFFeatureExtractor class (as this preprocessing is applied before computing features).

Features that should be ignored in training: TDE_mag0, fleet_t0, fleet_m0, ulens_t0, ulens_mag0.

Features in version 1.0.0 that should be ignored when using the production db (maybe removed from the db at some point?):

  • From PanStarrsFeatureExtractor: dist_nr (reason: name changed to distpsnr1)
  • From MicroLensExtractor: ulens_u0, ulens_tE, ulens_fs, ulens_chi (reason: bugs)
  • From TDETailExtractor: TDE_decay and TDE_decay_chi (reason: expressions/limits used are different from Manuel's paper)
  • From ColorVariationExtractor: color_variation (reason: expressions/limits used are different from Manuel's paper)
  • From FleetExtractor: fleet_a, fleet_w, fleet_chi (reason: bug)
  • From SPMExtractor: SPM_chi (reason: bug)

Note about new feature mean_distnr: implementation considers only light curve epochs with distnr <= 5 arcsec, even when the public stream can include epochs with distnr > 5 arcsec. The cut is implemented because the training set comes from the ZTF forced photometry service, that sets distnr > 5 arcsec values to nan.

IMPORTANT: more changes are needed before running in pipeline

Copy link

codecov bot commented Oct 28, 2024

Codecov Report

Attention: Patch coverage is 76.72131% with 71 lines in your changes missing coverage. Please review.

Project coverage is 81.83%. Comparing base (01a1e23) to head (db8b73e).
Report is 109 commits behind head on main.

Files with missing lines Patch % Lines
lc_classifier/lc_classifier/utils.py 43.39% 30 Missing ⚠️
...features/extractors/reference_feature_extractor.py 71.92% 16 Missing ⚠️
feature_step/features/database.py 60.00% 12 Missing ⚠️
lc_classifier/lc_classifier/examples/data.py 77.77% 4 Missing ⚠️
..._classifier/features/extractors/ulens_extractor.py 63.63% 4 Missing ⚠️
...features/extractors/panstarrs_feature_extractor.py 84.21% 3 Missing ⚠️
...lc_classifier/features/extractors/tde_extractor.py 92.30% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (01a1e23) and HEAD (db8b73e). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (01a1e23) HEAD (db8b73e)
unittest 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #492       +/-   ##
===========================================
- Coverage   95.12%   81.83%   -13.29%     
===========================================
  Files           4       84       +80     
  Lines          41     4597     +4556     
===========================================
+ Hits           39     3762     +3723     
- Misses          2      835      +833     
Flag Coverage Δ
feature_step_integration 91.05% <85.10%> (?)
feature_step_unittest 86.17% <87.23%> (?)
lc_classifier 75.30% <72.03%> (?)
libs_apf_unittest 88.86% <ø> (?)
magstats_step_integration 98.38% <ø> (?)
magstats_step_unittest 97.98% <ø> (?)
prv_candidates_step_integration 85.41% <ø> (?)
reflector_step_integration 95.12% <ø> (?)
reflector_step_unittest 95.12% <ø> (?)
sorting_hat_step_integration 93.00% <ø> (?)
sorting_hat_step_unittest 84.36% <ø> (?)
unittest ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ale-munozarancibia
Copy link
Contributor Author

Updates:

  • Added new/modified features to features_record file.
  • In the reference feature extractor, added missing restrictions on distnr and rfid: rows that have distnr = -99999 or rfid = nan are discarded before computing these features. In the training set, these restrictions were applied before creating the astro_objects.
  • Fixed a bug that produces artificially low Harmonics_chi values when difference fluxes are used. This does not imply re-running the training set harmonics features, because we use magnitudes for them.
  • Included the PS1 i and z magnitudes in the parser for the feature step, and in the message factory.
  • Included the reference info (from both messages and database) in the feature step, involving a new query to the database "reference" table. The parser now also handles reference deduplications and cases where there are missing reference data.
  • Included the reference info (from messages) in the message factory.
  • Updated the unit tests to include a mock database connection and check the reference usage.
  • Added a unit test for the reference feature extractor.

@ale-munozarancibia ale-munozarancibia merged commit f8c3e32 into main Nov 6, 2024
49 of 53 checks passed
@ale-munozarancibia ale-munozarancibia deleted the features_experiments branch November 6, 2024 15:14
@ale-munozarancibia ale-munozarancibia restored the features_experiments branch November 7, 2024 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants