Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Outlier Detection to use stcal #1357

Merged
merged 17 commits into from
Oct 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changes/1357.outlier_detection.0.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Remove unused arguments to outlier detection.
1 change: 1 addition & 0 deletions changes/1357.outlier_detection.1.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Update input handling to raise an exception on an invalid input instead of issuing a warning and skipping the step.
1 change: 1 addition & 0 deletions changes/1357.outlier_detection.2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Use stcal common code in outlier detection.
69 changes: 24 additions & 45 deletions docs/roman/outlier_detection/arguments.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
.. _outlier_detection_step_args:

For more details about step arguments (including datatypes, possible values
and defaults) see :py:obj:`romancal.outlier_detection.OutlierDetectionStep.spec`.

Step Arguments
==============
The ``outlier_detection`` step has the following optional arguments that control the
behavior of the processing:

``--weight_type`` (string, default='exptime')
``--weight_type``
The type of data weighting to use during resampling the images for creating the
median image used for detecting outliers; options are `'ivm'`, `'exptime'`,
and `None` (see :ref:`weight_type_options_details_section` for details).

``--pixfrac`` (float, default=1.0)
``--pixfrac``
Fraction by which input pixels are “shrunk” before being drizzled onto the output
image grid, given as a real number between 0 and 1. This specifies the size of the
footprint, or “dropsize”, of a pixel in units of the input pixel size. If `pixfrac`
Expand All @@ -20,7 +23,7 @@ behavior of the processing:
output drizzled image is fully populated with pixels from the input image.
Valid values range from 0.0 to 1.0.

``--kernel`` (string, default='square')
``--kernel``
This parameter specifies the form of the kernel function used to distribute
flux onto the separate output images, for the initial separate drizzling
operation only. The value options for this parameter include:
Expand All @@ -43,7 +46,7 @@ behavior of the processing:
should never be used for ``pixfrac != 1.0``, and is not recommended
for ``scale!=1.0``.

``--fillval`` (string, default='INDEF')
``--fillval``
The value for this parameter is to be assigned to the output pixels that
have zero weight or which do not receive flux from any input pixels during
drizzling. This parameter corresponds to the ``fillval`` parameter of the
Expand All @@ -55,77 +58,53 @@ behavior of the processing:
Any floating-point value, given as a string, is valid.
A value of 'INDEF' will use the last zero weight flux.

``--nlow`` (integer, default=0)
The number of low values in each pixel stack to ignore when computing the median
value.

``--nhigh`` (integer, default=0)
The number of high values in each pixel stack to ignore when computing the median
value.

``--maskpt`` (float, default=0.7)
``--maskpt``
Percentage of weight image values below which they are flagged as bad and rejected
from the median image. Valid values range from 0.0 to 1.0.

``--grow`` (integer, default=1)
The distance, in pixels, beyond the limit set by the rejection algorithm being
used, for additional pixels to be rejected in an image.

``--snr`` (string, default='4.0 3.0')
``--snr``
The signal-to-noise values to use for bad pixel identification. Since cosmic rays
often extend across several pixels the user must specify two cut-off values for
determining whether a pixel should be masked: the first for detecting the primary
cosmic ray, and the second (typically lower threshold) for masking lower-level bad
pixels adjacent to those found in the first pass. Valid values are a pair of
floating-point values in a single string.
floating-point values in a single string (for example "5.0 4.0").

``--scale`` (string, default='0.5 0.4')
``--scale``
The scaling factor applied to derivative used to identify bad pixels. Since cosmic
rays often extend across several pixels the user must specify two cut-off values for
determining whether a pixel should be masked: the first for detecting the primary
cosmic ray, and the second (typically lower threshold) for masking lower-level bad
pixels adjacent to those found in the first pass. Valid values are a pair of
floating-point values in a single string.
floating-point values in a single string (for example "1.2 0.7").

``--backg`` (float, default=0.0)
``--backg``
User-specified background value (scalar) to subtract during final identification
step of outliers in `driz_cr` computation.

``--kernel_size`` (string, default='7 7')
Size of kernel to be used during resampling of the data
(i.e. when `resample_data=True`).

``--save_intermediate_results`` (boolean, default=False)
Specifies whether or not to write out intermediate products such as median image or
``--save_intermediate_results``
Boolean specifying whether or not to write out intermediate products such as median image or
resampled individual input exposures to disk. Typically, only used to track down
problems with final results when too many or too few pixels are flagged as outliers.

``--resample_data`` (boolean, default=True)
Specifies whether or not to resample the input images when performing outlier
``--resample_data``
Boolean specifying whether or not to resample the input images when performing outlier
detection.

``--good_bits`` (string, default=0)
``--good_bits``
The DQ bit values from the input image DQ arrays that should be considered 'good'
when creating masks of bad pixels during outlier detection when resampling the data.
See `Roman's Data Quality Flags
<https://github.com/spacetelescope/romancal/blob/main/romancal/lib/dqflags.py>`_
braingram marked this conversation as resolved.
Show resolved Hide resolved
for details.

``--allowed_memory`` (float, default=None)
Specifies the fractional amount of free memory to allow when creating the resampled
image. If ``None``, the environment variable ``DMODEL_ALLOWED_MEMORY`` is used. If
not defined, no check is made. If the resampled image would be larger than specified,
an ``OutputTooLargeError`` exception will be generated. For example, if set to
``0.5``, only resampled images that use less than half the available memory can be
created.

``--in_memory`` (boolean, default=False)
Specifies whether or not to keep all intermediate products and datamodels in
``--in_memory``
Boolean specifying whether or not to keep all intermediate products and datamodels in
memory at the same time during the processing of this step. If set to `False`,
all input and output data will be written to disk at the start of the step
(as much as `roman_datamodels` will allow, anyway), then read in to memory only when
accessed. This results in a much lower memory profile at the expense of file I/O,
which can allow large mosaics to process in more limited amounts of memory.
any `ModelLibrary` opened by this step will use ``on_disk=True`` and use temporary
files to store model modifications. Additionally any resampled images will
be kept in memory (as long as needed). This can result in much lower memory
usage (at the expense of file I/O) to process large associations.

.. _weight_type_options_details_section:

Expand Down
33 changes: 8 additions & 25 deletions docs/roman/outlier_detection/outlier_detection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,20 +55,13 @@ Specifically, this routine performs the following operations:

* The median image is created by combining all grouped mosaic images or
non-resampled input data pixel-by-pixel.
* The ``nlow`` and ``nhigh`` parameters specify how many low and high values
to ignore when computing the median for any given pixel.
* The ``maskpt`` parameter sets the percentage of the weight image values to
use, and any pixel with a weight below this value gets flagged as "bad" and
ignored when resampled.
* The ``grow`` parameter sets the width, in pixels, beyond the limit set by
the rejection algorithm being used, for additional pixels to be rejected in
an image.
* The median image is written out to disk as `_<asn_id>_median` by default.

#. By default, the median image is blotted back (inverse of resampling) to
match each original input image.

* Blotted images are written out to disk as `_<asn_id>_blot` by default.
* **If resampling is turned off**, the median image is compared directly to
each input image.

Expand Down Expand Up @@ -136,26 +129,16 @@ memory usage at the expense of file I/O. The control over this memory model hap
with the use of the ``in_memory`` parameter. The full impact of this parameter
during processing includes:

#. The ``save_open`` parameter gets set to `False`
#. The ``on_disk`` parameter gets set to `True`
when opening the input :py:class:`~romancal.datamodels.library.ModelLibrary`
object. This forces all input models in the input
:py:class:`~romancal.datamodels.library.ModelLibrary` to get written out to disk.
It then uses the filename of the input model during subsequent processing.
object. This causes modified models to be written to temporary files.

#. The ``in_memory`` parameter gets passed to the :py:class:`~romancal.resample.ResampleStep`
to set whether or not to keep the resampled images in memory or not. By default,
the outlier detection processing sets this parameter to `False` so that each resampled
image gets written out to disk.

#. Computing the median image works section-by-section by only keeping 1Mb of each input
in memory at a time. As a result, only the final output product array for the final
median image along with a stack of 1Mb image sections are kept in memory.

#. The final resampling step also avoids keeping all inputs in memory by only reading
each input into memory 1 at a time as it gets resampled onto the final output product.
#. Computing the median image uses temporary files. Each resampled group
is split into sections (1 per "row") and each section is appended to a different
temporary file. After resampling all groups, each temporary file is read and a
median is computed for all sections in that file (yielding a median for that
section across all resampled groups). Finally, these median sections are
combined into a final median image.

These changes result in a minimum amount of memory usage during processing at the obvious
expense of reading and writing the products from disk.


.. automodapi:: romancal.outlier_detection.outlier_detection
5 changes: 2 additions & 3 deletions docs/roman/outlier_detection/outlier_detection_step.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,8 @@ OutlierDetectionStep
--------------------

This module provides the sole interface to all methods of performing outlier detection
on Roman observations. The outlier detection algorithm used for WFI data is implemented
in :py:class:`~romancal.outlier_detection.outlier_detection.OutlierDetection`
and described in :ref:`outlier-detection-imaging`.
on Roman observations. The outlier detection algorithm used for WFI data is
described in :ref:`outlier-detection-imaging`.

.. note::
Whether the data are being provided in an `association file`_ or as a list of ASDF filenames,
Expand Down
45 changes: 45 additions & 0 deletions romancal/outlier_detection/_fileio.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import logging

from astropy.units import Quantity

log = logging.getLogger(__name__)
log.setLevel(logging.DEBUG)


def save_median(example_model, median_data, median_wcs, make_output_path):
_save_intermediate_output(
_make_median_model(example_model, median_data, median_wcs),
"median",
make_output_path,
)


def save_drizzled(drizzled_model, make_output_path):
_save_intermediate_output(drizzled_model, "outlier_i2d", make_output_path)


def _make_median_model(example_model, data, wcs):
model = example_model.copy()
model.data = Quantity(data, unit=model.data.unit)
model.meta.filename = "drizzled_median.asdf"
model.meta.wcs = wcs
return model


def _save_intermediate_output(model, suffix, make_output_path):
"""
Ensure all intermediate outputs from OutlierDetectionStep have consistent file naming conventions

Notes
-----
self.make_output_path() is updated globally for the step in the main pipeline
to include the asn_id in the output path, so no need to handle it here.
"""

# outlier_?2d is not a known suffix, and make_output_path cannot handle an
# underscore in an unknown suffix, so do a manual string replacement
input_path = model.meta.filename.replace("_outlier_", "_")

output_path = make_output_path(input_path, suffix=suffix)
model.save(output_path)
log.info(f"Saved {suffix} model in {output_path}")
Loading