JP-3768: Move outlier detection median computers to stcal #292

emolter · 2024-09-27T17:51:57Z

This PR ports the median calculation machinery added by spacetelescope/jwst#8782 into stcal for use by other missions. See spacetelescope/jwst#8840 for the related changes in jwst.

Tasks

update or add relevant tests
update relevant docstrings and / or docs/ page
Does this PR change any API used downstream? (if not, label with no-changelog-entry-needed)
- write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
- run regression tests with this branch installed ("git+https://github.com/<fork>/stcal@<branch>")
  - jwst regression test
  - romancal regression test

news fragment change types...

changes/<PR#>.apichange.rst: change to public API
changes/<PR#>.bugfix.rst: fixes an issue
changes/<PR#>.general.rst: infrastructure or miscellaneous change

…P-3768

codecov · 2024-09-27T17:56:42Z

Codecov Report

Attention: Patch coverage is 99.23664% with 2 lines in your changes missing coverage. Please review.

Project coverage is 85.19%. Comparing base (8988d2c) to head (c2d10b7).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/stcal/outlier_detection/median.py	98.60%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #292      +/-   ##
==========================================
+ Coverage   84.76%   85.19%   +0.43%     
==========================================
  Files          44       46       +2     
  Lines        8542     8804     +262     
==========================================
+ Hits         7241     7501     +260     
- Misses       1301     1303       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

emolter · 2024-09-27T18:49:26Z

jwst regtests started here
edit: no failures

kmacdonald-stsci

There are some incomplete docstrings missing a brief description of functions/methods. Other than that, looks fine.

kmacdonald-stsci · 2024-09-30T10:52:27Z

src/stcal/outlier_detection/median.py

+    memory efficiency optimizations. np.nanmedian always uses at least 64-bit
+    precision internally, and this is too memory-intensive. Instead, loop over
+    the median calculation to avoid the memory usage of the internal upcasting
+    and temporary array allocation. The additional runtime of this loop is


What's the largest index of shape[0] tested? When testing ramp fitting the file jw02589006001_04101_00001-seg001_nrs1_uncal.fits has 6103 integrations. This seriously slowed down computation time in ramp fitting when running python compared to data with the same dimensions, but a much lower number of integrations.

If I understand the question you are asking about the runtime performance of the nanmedian3D function, right? This conversation on the JWST PR may be relevant, although no test was done with the zeroth (time/n_groups) axis being as large as that.

When you say that it seriously slowed down runtime in ramp fitting, do you mean that a median calculation was done, or just that the entire step scaled poorly with the number of integrations?

For imaging data, where we've seen the slowest processing, the largest n_groups we have seen is only ~100, since the step takes _cal files as input instead of _calints. However, I can look into whether there would be any coronagraphic data that had a huge number of integrations in their _calints files, as this is also used by jwst for coronagraphic data

I tried this on a relatively large coronagraphy dataset I had lying around from JWSTDMS-921. The shape of the array going into this function was (1250, 224, 288), so still not 6000 but this is on the large side of what is processed here in practice. The median calculation does not blow up the runtime in this case: the entire outlier detection step took only 5 seconds to run, as compared with ~1 hour for the coronagraphy-specific align_refs and klip steps. So I don't think this is a concern for runtime but let me know if you would like to see additional tests run.

kmacdonald-stsci · 2024-09-30T10:58:05Z

src/stcal/outlier_detection/median.py

+               data: np.ndarray,
+               idx: int | None = None
+               ) -> None:
+        """


Missing docstring describing method.

Let me know if the updated docstring looks ok to you

kmacdonald-stsci · 2024-09-30T10:58:32Z

src/stcal/outlier_detection/median.py

+            self._median_computer.add_image(data)
+
+    def evaluate(self: MedianComputer) -> np.ndarray:
+        """


Missing docstring describing method.

Let me know if the updated docstring looks ok to you

src/stcal/outlier_detection/median.py

braingram

Most comments are docs/docstring related.

Would you add median.py to the docs? It will likely be easier if the submodule has an __all__ (I added a comment about this) which will also help to define the "public" API.

src/stcal/outlier_detection/median.py

tests/outlier_detection/test_median.py

emolter · 2024-10-01T14:36:37Z

@braingram I think I incorporated all your comments with the most recent push. Thanks for the docstring updates, I think they improved things a lot, and I agree with your idea to make the OnDiskMedian and DiskAppendableArray private

tests/outlier_detection/test_median.py

pyproject.toml

braingram · 2024-10-01T16:18:25Z

Would you add median.py to the docs? It will likely be easier if the submodule has an __all__ (I added a comment about this) which will also help to define the "public" API.

Was the addition of this new API to the docs not pushed?

emolter · 2024-10-01T16:22:29Z

Would you add median.py to the docs? It will likely be easier if the submodule has an __all__ (I added a comment about this) which will also help to define the "public" API.

Was the addition of this new API to the docs not pushed?

I didn't even see the comment, my fault. I will look at the docs from this branch and see what changes are needed. I did already add the __all__

braingram · 2024-10-01T17:03:43Z

Downstream romancal failures are what I'd consider unrelated. It appears the CI is using the pytest configuration from stcal when running those tests (instead of using the one in romancal). The jwst tests haven't finished but I wouldn't be surprised if it does the same thing (and is now turning some unclosed file warnings into errors).

Thanks for updating the docs, they look good to me:
https://stcal--292.org.readthedocs.build/en/292/stcal/outlier_detection/index.html

I think #292 (comment) is the only unresolved comment I made.

src/stcal/outlier_detection/median.py

Co-authored-by: Brett Graham <[email protected]>

braingram

Thanks! This looks great to me.

emolter added 6 commits September 27, 2024 11:40

fix ruff style checks for median

66c5c61

added type hints to median.py

8c8ef11

Merge branch 'main' of https://github.com/spacetelescope/stcal into J…

ddf9d3b

…P-3768

ported over unit tests

cf557b9

added test for MedianComputer

afe4b08

missed some ruff checks

83a8f1f

github-actions bot added the testing label Sep 27, 2024

changelog, fix oldestdeps, fix mypy problem

d41546a

emolter mentioned this pull request Sep 27, 2024

JP-3768: Move outlier detection median computers to stcal spacetelescope/jwst#8840

Merged

10 tasks

emolter marked this pull request as ready for review September 27, 2024 19:19

emolter requested a review from a team as a code owner September 27, 2024 19:19

emolter requested review from braingram, melanieclarke and mairanteodoro September 27, 2024 19:20

emolter added the outlier-detection label Sep 27, 2024

kmacdonald-stsci approved these changes Sep 30, 2024

View reviewed changes

braingram reviewed Sep 30, 2024

View reviewed changes

src/stcal/outlier_detection/median.py Outdated Show resolved Hide resolved

fixes after comments from @kmacdonald-stsci and @braingram

31adf16

github-actions bot added the installation label Sep 30, 2024

braingram mentioned this pull request Sep 30, 2024

Update Outlier Detection to use stcal spacetelescope/romancal#1357

Merged

6 tasks

braingram reviewed Oct 1, 2024

View reviewed changes

changes per review from @braingram

509a6ae

forgot to change private methods in test

19ac974

emolter requested a review from braingram October 1, 2024 14:41

braingram reviewed Oct 1, 2024

View reviewed changes

tests/outlier_detection/test_median.py Outdated Show resolved Hide resolved

emolter added 2 commits October 1, 2024 11:54

replace tmpdir with tmp_path

fafb5e7

upgrade cleanup warnings to errors in pyproject.toml

c38478a

emolter requested a review from braingram October 1, 2024 16:04

braingram reviewed Oct 1, 2024

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

put automodapi in docs, small doc style updates

10378bd

emolter force-pushed the JP-3768 branch from 3c93257 to 10378bd Compare October 1, 2024 16:37

github-actions bot added the documentation Improvements or additions to documentation label Oct 1, 2024

undo removal of comment about unraisableexception

4a350c4

missed one docstring improvement

2608a37

braingram reviewed Oct 1, 2024

View reviewed changes

src/stcal/outlier_detection/median.py Outdated Show resolved Hide resolved

Update src/stcal/outlier_detection/median.py

c2d10b7

Co-authored-by: Brett Graham <[email protected]>

braingram approved these changes Oct 1, 2024

View reviewed changes

emolter merged commit e4dbc44 into spacetelescope:main Oct 1, 2024
24 of 26 checks passed

emolter deleted the JP-3768 branch October 1, 2024 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JP-3768: Move outlier detection median computers to stcal #292

JP-3768: Move outlier detection median computers to stcal #292

emolter commented Sep 27, 2024 •

edited

Loading

codecov bot commented Sep 27, 2024 •

edited

Loading

emolter commented Sep 27, 2024 •

edited

Loading

kmacdonald-stsci left a comment

kmacdonald-stsci Sep 30, 2024

emolter Sep 30, 2024 •

edited

Loading

emolter Sep 30, 2024

kmacdonald-stsci Sep 30, 2024

emolter Sep 30, 2024

kmacdonald-stsci Sep 30, 2024

emolter Sep 30, 2024

braingram left a comment

emolter commented Oct 1, 2024

braingram commented Oct 1, 2024

emolter commented Oct 1, 2024

braingram commented Oct 1, 2024

braingram left a comment

JP-3768: Move outlier detection median computers to stcal #292

JP-3768: Move outlier detection median computers to stcal #292

Conversation

emolter commented Sep 27, 2024 • edited Loading

Tasks

codecov bot commented Sep 27, 2024 • edited Loading

Codecov Report

emolter commented Sep 27, 2024 • edited Loading

kmacdonald-stsci left a comment

Choose a reason for hiding this comment

kmacdonald-stsci Sep 30, 2024

Choose a reason for hiding this comment

emolter Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

emolter Sep 30, 2024

Choose a reason for hiding this comment

kmacdonald-stsci Sep 30, 2024

Choose a reason for hiding this comment

emolter Sep 30, 2024

Choose a reason for hiding this comment

kmacdonald-stsci Sep 30, 2024

Choose a reason for hiding this comment

emolter Sep 30, 2024

Choose a reason for hiding this comment

braingram left a comment

Choose a reason for hiding this comment

emolter commented Oct 1, 2024

braingram commented Oct 1, 2024

emolter commented Oct 1, 2024

braingram commented Oct 1, 2024

braingram left a comment

Choose a reason for hiding this comment

emolter commented Sep 27, 2024 •

edited

Loading

codecov bot commented Sep 27, 2024 •

edited

Loading

emolter commented Sep 27, 2024 •

edited

Loading

emolter Sep 30, 2024 •

edited

Loading