-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JP-3768: Move outlier detection median computers to stcal #292
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #292 +/- ##
==========================================
+ Coverage 84.76% 85.19% +0.43%
==========================================
Files 44 46 +2
Lines 8542 8804 +262
==========================================
+ Hits 7241 7501 +260
- Misses 1301 1303 +2 ☔ View full report in Codecov by Sentry. |
jwst regtests started here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some incomplete docstrings missing a brief description of functions/methods. Other than that, looks fine.
memory efficiency optimizations. np.nanmedian always uses at least 64-bit | ||
precision internally, and this is too memory-intensive. Instead, loop over | ||
the median calculation to avoid the memory usage of the internal upcasting | ||
and temporary array allocation. The additional runtime of this loop is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the largest index of shape[0]
tested? When testing ramp fitting the file jw02589006001_04101_00001-seg001_nrs1_uncal.fits
has 6103 integrations. This seriously slowed down computation time in ramp fitting when running python compared to data with the same dimensions, but a much lower number of integrations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the question you are asking about the runtime performance of the nanmedian3D function, right? This conversation on the JWST PR may be relevant, although no test was done with the zeroth (time/n_groups) axis being as large as that.
When you say that it seriously slowed down runtime in ramp fitting, do you mean that a median calculation was done, or just that the entire step scaled poorly with the number of integrations?
For imaging data, where we've seen the slowest processing, the largest n_groups we have seen is only ~100, since the step takes _cal files as input instead of _calints. However, I can look into whether there would be any coronagraphic data that had a huge number of integrations in their _calints files, as this is also used by jwst for coronagraphic data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this on a relatively large coronagraphy dataset I had lying around from JWSTDMS-921. The shape of the array going into this function was (1250, 224, 288), so still not 6000 but this is on the large side of what is processed here in practice. The median calculation does not blow up the runtime in this case: the entire outlier detection step took only 5 seconds to run, as compared with ~1 hour for the coronagraphy-specific align_refs
and klip
steps. So I don't think this is a concern for runtime but let me know if you would like to see additional tests run.
data: np.ndarray, | ||
idx: int | None = None | ||
) -> None: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docstring describing method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if the updated docstring looks ok to you
self._median_computer.add_image(data) | ||
|
||
def evaluate(self: MedianComputer) -> np.ndarray: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docstring describing method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if the updated docstring looks ok to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most comments are docs/docstring related.
Would you add median.py
to the docs? It will likely be easier if the submodule has an __all__
(I added a comment about this) which will also help to define the "public" API.
@braingram I think I incorporated all your comments with the most recent push. Thanks for the docstring updates, I think they improved things a lot, and I agree with your idea to make the OnDiskMedian and DiskAppendableArray private |
Was the addition of this new API to the docs not pushed? |
I didn't even see the comment, my fault. I will look at the docs from this branch and see what changes are needed. I did already add the |
Downstream romancal failures are what I'd consider unrelated. It appears the CI is using the pytest configuration from stcal when running those tests (instead of using the one in romancal). The jwst tests haven't finished but I wouldn't be surprised if it does the same thing (and is now turning some unclosed file warnings into errors). Thanks for updating the docs, they look good to me: I think #292 (comment) is the only unresolved comment I made. |
Co-authored-by: Brett Graham <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks great to me.
Resolves JP-3768
This PR ports the median calculation machinery added by spacetelescope/jwst#8782 into stcal for use by other missions. See spacetelescope/jwst#8840 for the related changes in jwst.
Tasks
docs/
pageno-changelog-entry-needed
)changes/
:echo "changed something" > changes/<PR#>.<changetype>.rst
(see below for change types)"git+https://github.com/<fork>/stcal@<branch>"
)jwst
regression testromancal
regression testnews fragment change types...
changes/<PR#>.apichange.rst
: change to public APIchanges/<PR#>.bugfix.rst
: fixes an issuechanges/<PR#>.general.rst
: infrastructure or miscellaneous change