Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JP-3755: Remove unused options and add unit tests to SOSS extraction algorithm #9000

Draft
wants to merge 59 commits into
base: main
Choose a base branch
from

Conversation

emolter
Copy link
Collaborator

@emolter emolter commented Dec 10, 2024

Resolves JP-3755

Closes #8804

This PR addresses reducing the technical debt of the SOSS ATOCA algorithm. There are a lot of changed lines, so I'm annotating this PR with comments during a self-review. The high-level changes are:

  • Removed many unused functions, i.e., ones that were never called during a pipeline run
  • Removed many optional (and occasionally required) parameters from functions when they were either never changed from their default value, or always set to the same value (AND were not set-able by the user or by parameter reference files)
  • Added a unit testing suite, including a detector model (wave grid, trace profiles, kernels, throughput functions, mock data, masks, etc) scaled down by a factor of 10 in each spatial dimension. Added unit tests for most helper functions.
  • Fixed various bugs, usually found when unit-testing function edge cases. I'm attempting to annotate all of these, so it's clear where actual changes to output have occurred. (These obviously require the most scrutiny).
  • Many small refactors for clarity

There are several outstanding items to complete before this can be merged:

  • Validate that changes in behavior (e.g., differences in retrieved Tikhonov regularization factor) are ok, i.e., not adversely affecting science results.
  • Get several questions about intended / current behavior answered by instrument team members.
  • Check whether some removed functions should be retained in the pipeline version of ATOCA, e.g., because they are particularly helpful for INS work on SOSS data (even though not used during pipeline runs).
  • Add documentation of the algorithm to the pipeline readthedocs page.

Tasks

  • request a review from someone specific, to avoid making the maintainers review every PR
  • add a build milestone, i.e. Build 11.3 (use the latest build if not sure)
  • Does this PR change user-facing code / API? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
    • update or add relevant tests
    • update relevant docstrings and / or docs/ page
    • start a regression test and include a link to the running job (click here for instructions)
      • Do truth files need to be updated ("okified")?
        • after the reviewer has approved these changes, run okify_regtests to update the truth files
  • if a JIRA ticket exists, make sure it is resolved properly
news fragment change types...
  • changes/<PR#>.general.rst: infrastructure or miscellaneous change
  • changes/<PR#>.docs.rst
  • changes/<PR#>.stpipe.rst
  • changes/<PR#>.datamodels.rst
  • changes/<PR#>.scripts.rst
  • changes/<PR#>.fits_generator.rst
  • changes/<PR#>.set_telescope_pointing.rst
  • changes/<PR#>.pipeline.rst

stage 1

  • changes/<PR#>.group_scale.rst
  • changes/<PR#>.dq_init.rst
  • changes/<PR#>.emicorr.rst
  • changes/<PR#>.saturation.rst
  • changes/<PR#>.ipc.rst
  • changes/<PR#>.firstframe.rst
  • changes/<PR#>.lastframe.rst
  • changes/<PR#>.reset.rst
  • changes/<PR#>.superbias.rst
  • changes/<PR#>.refpix.rst
  • changes/<PR#>.linearity.rst
  • changes/<PR#>.rscd.rst
  • changes/<PR#>.persistence.rst
  • changes/<PR#>.dark_current.rst
  • changes/<PR#>.charge_migration.rst
  • changes/<PR#>.jump.rst
  • changes/<PR#>.clean_flicker_noise.rst
  • changes/<PR#>.ramp_fitting.rst
  • changes/<PR#>.gain_scale.rst

stage 2

  • changes/<PR#>.assign_wcs.rst
  • changes/<PR#>.badpix_selfcal.rst
  • changes/<PR#>.msaflagopen.rst
  • changes/<PR#>.nsclean.rst
  • changes/<PR#>.imprint.rst
  • changes/<PR#>.background.rst
  • changes/<PR#>.extract_2d.rst
  • changes/<PR#>.master_background.rst
  • changes/<PR#>.wavecorr.rst
  • changes/<PR#>.srctype.rst
  • changes/<PR#>.straylight.rst
  • changes/<PR#>.wfss_contam.rst
  • changes/<PR#>.flatfield.rst
  • changes/<PR#>.fringe.rst
  • changes/<PR#>.pathloss.rst
  • changes/<PR#>.barshadow.rst
  • changes/<PR#>.photom.rst
  • changes/<PR#>.pixel_replace.rst
  • changes/<PR#>.resample_spec.rst
  • changes/<PR#>.residual_fringe.rst
  • changes/<PR#>.cube_build.rst
  • changes/<PR#>.extract_1d.rst
  • changes/<PR#>.resample.rst

stage 3

  • changes/<PR#>.assign_mtwcs.rst
  • changes/<PR#>.mrs_imatch.rst
  • changes/<PR#>.tweakreg.rst
  • changes/<PR#>.skymatch.rst
  • changes/<PR#>.exp_to_source.rst
  • changes/<PR#>.outlier_detection.rst
  • changes/<PR#>.tso_photometry.rst
  • changes/<PR#>.stack_refs.rst
  • changes/<PR#>.align_refs.rst
  • changes/<PR#>.klip.rst
  • changes/<PR#>.spectral_leak.rst
  • changes/<PR#>.source_catalog.rst
  • changes/<PR#>.combine_1d.rst
  • changes/<PR#>.ami.rst

other

  • changes/<PR#>.wfs_combine.rst
  • changes/<PR#>.white_light.rst
  • changes/<PR#>.cube_skymatch.rst
  • changes/<PR#>.engdb_tools.rst
  • changes/<PR#>.guider_cds.rst

Copy link

codecov bot commented Dec 10, 2024

Codecov Report

Attention: Patch coverage is 96.08856% with 53 lines in your changes missing coverage. Please review.

Project coverage is 78.28%. Comparing base (e3d263f) to head (e0c8883).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
jwst/extract_1d/soss_extract/soss_extract.py 73.86% 23 Missing ⚠️
jwst/extract_1d/soss_extract/atoca_utils.py 90.95% 20 Missing ⚠️
jwst/extract_1d/soss_extract/pastasoss.py 75.67% 9 Missing ⚠️
jwst/extract_1d/soss_extract/soss_syscor.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9000      +/-   ##
==========================================
+ Coverage   76.81%   78.28%   +1.47%     
==========================================
  Files         496      504       +8     
  Lines       45610    46039     +429     
==========================================
+ Hits        35034    36042    +1008     
+ Misses      10576     9997     -579     
Flag Coverage Δ *Carryforward flag
nightly 77.81% <ø> (+0.38%) ⬆️ Carriedforward from e3d263f

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@melanieclarke melanieclarke added this to the Build 11.3 milestone Dec 12, 2024
Comment on lines +87 to +91
wave_grid : (N_k) array_like, required.
The grid on which f(lambda) will be projected.
mask_trace_profile : (N_ord, N, M) list or array of 2-D arrays[bool], required.
A list or array of the pixel that need to be used for extraction,
for each order on the detector. It has to have the same (N_ord, N, M) as `trace_profile`.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made these two parameters required because they were always specified explicitly when the ExtractionEngine was instantiated.

raise ValueError(msg.format(self.n_orders, len(wave_map)))

# Detector image.
self.data = np.full(self.data_shape, fill_value=np.nan)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having attributes self.data and self.error is unnecessary. The ExtractionEngine does not itself need any data to set up a model of the detector and use most of the functionality. The only functions where these were accessed had optional data and error parameters, and in reality these were always set non-trivially when calling the pipeline. These were made required for those specific functions, specifically get_detector_model(), get_pixel_mapping(), and compute_likelihood(). estimate_noise() was unused in the pipeline and removed

Comment on lines -141 to -149
# Set all reference file quantities to None.
self.wave_map = None
self.trace_profile = None
self.throughput = None
self.kernels = None

# Set the wavelength map and trace_profile for each order.
self.update_wave_map(wave_map)
self.update_trace_profile(trace_profile)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all set to something other than None later in the __init__, so there was no point setting them to None. The update_wave_map() function was removed because this was the only place it was called, and it could be replaced with a one-liner. The same is true for update_trace_profile(): its use in get_detector_model() turned out to also be unnecessary because the option to pass trace_profile into get_detector_model() was never used in the pipeline.

self.update_trace_profile(trace_profile)

# Set the mask based on trace profiles and save
if mask_trace_profile is None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mask_trace_profile was never set to None in real calls to this class, so I made it required and removed the option to set it to None

self.mask_trace_profile = mask_trace_profile

# Generate a wavelength grid if none was provided.
if wave_grid is None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wave_grid was never set to None in real calls to this class, so I made it required and removed the option to set it to None

return bkg_mask


def soss_oneoverf_correction(scidata, scimask, deepstack, bkg_mask=None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

log.setLevel(logging.DEBUG)


def zero_roll(a, shift):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

return result


def robust_polyfit(x, y, order, maxiter=5, nstd=3.):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only used as a helper to unused code in soss_centroids

return param


def get_image_dim(image, header=None):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only used as a helper to unused code in soss_centroids

Comment on lines +110 to +117
# In both orders, the errors on the replaced pixels are roughly
# half of the errors on the original good pixels
# There are enough replaced pixels here (~30) that small-number statistics cannot account for this
# The reason is because the code chooses the lower error between the two nearest-flux
# data points, and since the errors in our tests are uncorrelated with the flux values,
# this leads to a factor-of-2 decrease
# It's not clear to me that picking the smaller of two error values is the right thing to do
# but that behavior is documented
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this is actually the intended behavior or not, because it seems like a questionable choice to me (although admittedly I don't know very much).

@emolter emolter requested review from hover2pi and tapastro December 17, 2024 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce technical debt of ATOCA/SOSS extraction code
2 participants