Adding initial C code to ramp fitting.

Adding the setup.py necessary to install C extensions for ramp fitting. Adding first attempt at C code. Adding a setup.cfg file to be used for setup. Updating calling location and C code. Updated include ordering, as well as returning NoneType. Can compile calling setup.py directly and running a basic script. Still cannot run 'pip install -e .'. This generates a failure saying it's unable to find the numpy module, raising ModuleNotFoundError. Updating setup. Fixing install files to properly work with C extension framework in ramp fitting. Changing names. Updating C code to parse input parameters. Adding ramp handling for each pixel. Updating ramp data structure and pixel ramp data structure. Getting a pixel ramp and printing it to the screen. Updating code style and adding a simple linked list to handle segment computations. Completed median rate computation without flags and without special cases. Cleaning up the code and comments. The non-special case median rate computation now works. Commenting out calls to C extension function. Putting functions in alphabetical order to make them easier to navigate. Alphabetizing the functions to make them easier to search. Finding a bug in the median rate computation. Updating setup and Nympy macro for C based on code review. Fixed the local copy of the DQ for an integration for median rate computation. Completed the median rate calculations that accounts for flags in the ramp. Beginning to update type checking for ndarrays passed to C. Figuring out endianness solution. Still need to figure out how to detect endianness. Checking for a computing byteswapping makes things slower than python. Testing and comparing python to C code. Working on the weighting of a segment for ramp fitting. Continuing with weighted fitting. Finished the segment computations. Removed 'real_t' typedef to make code cleaner. Finished pixel ramp computations but the read noise computation is different from python code. JIC commit. JIC commit. Debugging the segment computation of the read noise variance. Updated the read noise computations for normal segments. Updated the documentation for ramp fitting to make the documentation for clear on the details of the computations. Removed extra blank lines from CI test file. Creating output base arrays and fix use of pixeldq in pixel_ramp. Packaged up output results from the C computations to be passed back to the python code. Adding a square root to the final computation of the combined error for the rate and rateints products. Successful CI tests for basic ramps run through the C extension. JIC commit. Fixing segment pruner. Adding test cases for C extension. Adding cases. Updated the final DQ array. Started implementing the optional results product. Moving debugging print statements. Adding optional results memory managment during ramp fitting. Outputting the optional results product. Some of the values still need to be computed. Updating computing the optional results product. Updating dimensions of the pedestal array. Adding pedestal calculation. Updating where the group time divide happens. Adding special case computation and testing. Tracing bug in special case. Working out slope computation bug. Forcing the use of C extension to ensure CI tests use the C extensions. Updating tests and DQ flag computations. Adding special case for one group segment. Working on special cases. Updating one group ramp testing. The variance computation is correct now, but need to investigate the slope computation, which may also be wrong in python. Working on a new test. Rearranging code to make it easier to read. Refactoring module API. Splitting ramp data getter into multiple, smaller functions. Updating tests, as well as refactoring the module interface. Updating the flags for suppressed ramps. Changing the C interface to make it simpler and easier to read. Cleaning up old code and adding one group suppression flagging. Modifying setup.py to get it to properly install C and cython stuff. Modifying setup to get it to work, since I am not sure how to resolve the conflicts resulting from the use of by C and cython. Updating invalid integrations test. ZEROFRAME test works. Suppressed one group tests work. Updating return code from C extension. Updating test_2_group_cases testing. Bugs were found in the python code, so those should be corrected first before finishing the C code. Updating code and tests for python to account for invalid integrations and invalid groups for median rate calculations. Updating error in median rate computation. Investigating differences on branch with main branch. Properly updating ols_fit.py from main branch updates. Finishing up C translation. Will need to further investigate two group ramp special case for rateints (see test_invalid_integrations). Updating the setup.py file to properly install the cython and c extension modules. Updating tests and setup.py format. Removing unneeded comments. Removing debugging imports. Fixing ZEROFRAME logic bug, as well as removing debugging code. All STCAL CI tests are passing with the C code. Need to check the JWST tests. Updating segment slope calculation for NaN values. Updating computation of read noise variance for bad gain value. Updating the debugging comments, as well as finishing the first group orphan test in JWST CI testing. Updating how the pedestal gets computed. Updating median rate computation for case 1 in JWST. Updating slope fitter to pass case 4 in the JWST CI test. Updating debugging functions. JIC. Changing variable name for failing test. Cleaning up the code. Skipping case 5 which fails for C due to architectural differences. Updating the handling of bad gain values. Base ramp fit and case testing pass. Updating the computation of SNR to handle non-positive values. Updating the ramp fit OLS code for use of C code. Changing declaration statements causing build failures. Added debugging macros. Removing importation of debugging functions used for debugging. Removed endian handlers for C, since it should be handled in python.
spacetelescope · Dec 27, 2023 · cc2259f · cc2259f
1 parent 30c41af
commit cc2259f
Show file tree

Hide file tree

Showing 8 changed files with 3,538 additions and 51 deletions.
diff --git a/docs/stcal/ramp_fitting/description.rst b/docs/stcal/ramp_fitting/description.rst
@@ -158,21 +158,27 @@ the least-squares fit is calculated with the first and last samples. In most pra
 cases, the data will fall somewhere in between, where the weighting is scaled between the
 two extremes.
 
-The signal-to-noise ratio :math:`S` used for weighting selection is calculated from the
-last sample as:
+
+For segment :math:`k` of length :math:`n`, which includes groups :math:`[g_{k}, ...,
+g_{k+n-1}]`, the signal-to-noise ratio :math:`S` used for weighting selection is
+calculated from the last sample as:
 
 .. math::
     S = \frac{data \times gain} { \sqrt{(read\_noise)^2 + (data \times gain) } } \,,
 
+where :math:`data = g_{k+n-1} - g_{k}`.
+
 The weighting for a sample :math:`i` is given as:
 
 .. math::
-    w_i = (i - i_{midpoint})^P \,,
+    w_i = \frac{ [(i - i_{midpoint}) / i_{midpoint}]^P }{ (read\_noise)^2 } \,,
+
+where  :math:`i_{midpoint} = \frac{n-1}{2}` and :math:`i = 0, 1, ..., n-1`.
 
-where :math:`i_{midpoint}` is the the sample number of the midpoint of the sequence, and
-:math:`P` is the exponent applied to weights, determined by the value of :math:`S`. Fixsen
-et al. 2000 found that defining a small number of P values to apply to values of S was
-sufficient; they are given as:
+
+is the the sample number of the midpoint of the sequence, and :math:`P` is the exponent
+applied to weights, determined by the value of :math:`S`. Fixsen et al. 2000 found that
+defining a small number of P values to apply to values of S was sufficient; they are given as:
 
 +-------------------+------------------------+----------+
 | Minimum S         | Maximum S              | P        |
@@ -194,12 +200,14 @@ Segment-specific Computations:
 ------------------------------
 The variance of the slope of a segment due to read noise is:
 
-.. math::
-   var^R_{s} = \frac{12 \ R^2 }{ (ngroups_{s}^3 - ngroups_{s})(tgroup^2) } \,,
+.. math::  
+   var^R_{s} = \frac{12 \ R^2 }{ (ngroups_{s}^3 - ngroups_{s})(tgroup^2)(gain^2) } \,,
 
-where :math:`R` is the noise in the difference between 2 frames,
-:math:`ngroups_{s}` is the number of groups in the segment, and :math:`tgroup` is the group
-time in seconds (from the keyword TGROUP).
+where :math:`R` is the noise in the difference between 2 frames, 
+:math:`ngroups_{s}` is the number of groups in the segment, and :math:`tgroup` is the group 
+time in seconds (from the keyword TGROUP).  The divide by gain converts to
+:math:`DN`.  For the special case where as segment has length 1, the
+:math:`ngroups_{s}` is set to :math:`2`.
 
 The variance of the slope in a segment due to Poisson noise is:
 
@@ -265,10 +273,10 @@ The combined variance of the slope is the sum of the variances:
 The square root of the combined variance is stored in the ERR array of the primary output.
 
 The overall slope depends on the slope and the combined variance of the slope of each integration's
-segments, so is a sum over integrations and segments:
+segments, so is a sum over integration values computed from the segements:
 
-.. math::
-    slope_{o} = \frac{ \sum_{i,s}{ \frac{slope_{i,s}} {var^C_{i,s}}}} { \sum_{i,s}{ \frac{1} {var^C_{i,s}}}}
+.. math::    
+    slope_{o} = \frac{ \sum_{i}{ \frac{slope_{i}} {var^C_{i}}}} { \sum_{i}{ \frac{1} {var^C_{i}}}}
 
 
 Upon successful completion of this step, the status keyword S_RAMP will be set

diff --git a/setup.py b/setup.py
@@ -6,6 +6,17 @@
 Options.docstrings = True
 Options.annotate = False
 
+# package_data values are glob patterns relative to each specific subpackage.
+package_data = {
+    "stcal.ramp_fitting.src": ["*.c"],
+}
+
+# Setup C module include directories
+include_dirs = [np.get_include()]
+
+# Setup C module macros
+define_macros = [("NUMPY", "1")]
+
 extensions = [
     Extension(
         "stcal.ramp_fitting.ols_cas22._ramp",
@@ -25,6 +36,12 @@
         include_dirs=[np.get_include()],
         language="c++",
     ),
+    Extension(
+        "stcal.ramp_fitting.slope_fitter",
+        ["src/stcal/ramp_fitting/src/slope_fitter.c"],
+        include_dirs=include_dirs,
+        define_macros=define_macros,
+    ),
 ]
 
 setup(ext_modules=cythonize(extensions))
diff --git a/src/stcal/ramp_fitting/ols_fit.py b/src/stcal/ramp_fitting/ols_fit.py
@@ -8,6 +8,7 @@
 
 import numpy as np
 
+from .slope_fitter import ols_slope_fitter  # c extension
 from . import ramp_fit_class, utils
 
 log = logging.getLogger(__name__)
@@ -656,6 +657,84 @@ def ols_ramp_fit_single(ramp_data, buffsize, save_opt, readnoise_2d, gain_2d, we
     opt_info : tuple
         The tuple of computed optional results arrays for fitting.
     """
+
+    # use_c = False
+    use_c = True
+    if use_c:
+        print(" ")
+        print("=" * 80)
+        c_start = time.time()
+
+        if ramp_data.drop_frames1 is None:
+            ramp_data.drop_frames1 = 0
+        print("    ---------------- Entering C Code ----------------")
+        image_info, integ_info, opt_info = ols_slope_fitter(
+                ramp_data, gain_2d, readnoise_2d, weighting, save_opt)
+        print("    ----------------  Return C Code  ----------------")
+
+        c_end = time.time()
+        print("=" * 80)
+
+        return image_info, integ_info, opt_info
+
+    # XXX start python time
+    p_start = time.time()
+
+    print("\n");
+    print("    ---------------- Entering Python Code ----------------")
+    image_info, integ_info, opt_info = ols_ramp_fit_single_python(
+        ramp_data, buffsize, save_opt, readnoise_2d, gain_2d, weighting)
+    print("    ----------------  Return Python Code  ----------------")
+
+    # XXX end python time
+    p_end = time.time()
+    if use_c:
+        c_python_time_comparison(c_start, c_end, p_start, p_end)
+
+    return image_info, integ_info, opt_info
+
+
+
+def ols_ramp_fit_single_python(
+        ramp_data, buffsize, save_opt, readnoise_2d, gain_2d, weighting):
+    """
+    Fit a ramp using ordinary least squares. Calculate the count rate for each
+    pixel in all data cube sections and all integrations, equal to the weighted
+    slope for all sections (intervals between cosmic rays) of the pixel's ramp
+    divided by the effective integration time.
+
+    Parameters
+    ----------
+    ramp_data : RampData
+        Input data necessary for computing ramp fitting.
+
+    buffsize : int
+        The working buffer size
+
+    save_opt : bool
+        Whether to return the optional output model
+
+    readnoise_2d : ndarray
+        The read noise of each pixel
+
+    gain_2d : ndarray
+        The gain of each pixel
+
+    weighting : str
+        'optimal' is the only valid value
+
+    Return
+    ------
+    image_info : tuple
+        The tuple of computed ramp fitting arrays.
+
+    integ_info : tuple
+        The tuple of computed integration fitting arrays.
+
+    opt_info : tuple
+        The tuple of computed optional results arrays for fitting.
+    """
+    # MAIN
     tstart = time.time()
 
     if not ramp_data.suppress_one_group_ramps:
@@ -681,6 +760,8 @@ def ols_ramp_fit_single(ramp_data, buffsize, save_opt, readnoise_2d, gain_2d, we
         log.warning("will be calculated as the value of that 1 group divided by ")
         log.warning("the group exposure time.")
 
+    # import ipdb; ipdb.set_trace()
+
     # In this 'First Pass' over the data, loop over integrations and data
     #   sections to calculate the estimated median slopes, which will be used
     #   to calculate the variances. This is the same method to estimate slopes
@@ -691,6 +772,8 @@ def ols_ramp_fit_single(ramp_data, buffsize, save_opt, readnoise_2d, gain_2d, we
     if fit_slopes_ans[0] == "saturated":
         return fit_slopes_ans[1:]
 
+    # import ipdb; ipdb.set_trace()
+
     # In this 'Second Pass' over the data, loop over integrations and data
     #   sections to calculate the variances of the slope using the estimated
     #   median slopes from the 'First Pass'. These variances are due to Poisson
@@ -715,6 +798,15 @@ def ols_ramp_fit_single(ramp_data, buffsize, save_opt, readnoise_2d, gain_2d, we
     return image_info, integ_info, opt_info
 
 
+def c_python_time_comparison(c_start, c_end, p_start, p_end):
+    c_diff = c_end - c_start
+    p_diff = p_end - p_start
+    c_div_p = c_diff / p_diff * 100.
+    print(f"{c_diff = }")
+    print(f"{p_diff = }")
+    print(f"{c_div_p = :.4f}%")
+
+
 def discard_miri_groups(ramp_data):
     """
     For MIRI datasets having >1 group, if all pixels in the final group are
@@ -889,6 +981,8 @@ def ramp_fit_slopes(ramp_data, gain_2d, readnoise_2d, save_opt, weighting):
 
     opt_res = utils.OptRes(n_int, imshape, max_seg, ngroups, save_opt)
 
+    # import ipdb; ipdb.set_trace()
+
     # Get Pixel DQ array from input file. The incoming RampModel has uint32
     #   PIXELDQ, but ramp fitting will update this array here by flagging
     #   the 2D PIXELDQ locations where the ramp data has been previously
@@ -904,7 +998,6 @@ def ramp_fit_slopes(ramp_data, gain_2d, readnoise_2d, save_opt, weighting):
     #   as is done in the jump detection step, except here CR-affected and
     #   saturated groups have already been flagged. The actual, fit, slopes for
     #   each segment are also calculated here.
-
     med_rates = utils.compute_median_rates(ramp_data)
 
     # Loop over data integrations:
@@ -1113,6 +1206,8 @@ def ramp_fit_compute_variances(ramp_data, gain_2d, readnoise_2d, fit_slopes_ans)
         segs_4,
     ) = utils.alloc_arrays_2(n_int, imshape, max_seg)
 
+    # import ipdb; ipdb.set_trace()
+
     # Loop over data integrations
     for num_int in range(n_int):
         ramp_data.current_integ = num_int
@@ -1141,7 +1236,9 @@ def ramp_fit_compute_variances(ramp_data, gain_2d, readnoise_2d, fit_slopes_ans)
             var_p4[num_int, :, rlo:rhi, :] = den_p3 * med_rates[rlo:rhi, :]
 
             # Find the segment variance due to read noise and convert back to DN
-            var_r4[num_int, :, rlo:rhi, :] = num_r3 * den_r3 / gain_sect**2
+            tmpgain = gain_sect**2
+            var_r4_tmp = num_r3 * den_r3 / tmpgain
+            var_r4[num_int, :, rlo:rhi, :] = var_r4_tmp 
 
             # Reset the warnings filter to its original state
             warnings.resetwarnings()
@@ -1169,6 +1266,7 @@ def ramp_fit_compute_variances(ramp_data, gain_2d, readnoise_2d, fit_slopes_ans)
         #   variance calculations.
         s_inv_var_p3[num_int, :, :] = (1.0 / var_p4[num_int, :, :, :]).sum(axis=0)
         var_p3[num_int, :, :] = 1.0 / s_inv_var_p3[num_int, :, :]
+
         s_inv_var_r3[num_int, :, :] = (1.0 / var_r4[num_int, :, :, :]).sum(axis=0)
         var_r3[num_int, :, :] = 1.0 / s_inv_var_r3[num_int, :, :]
 
@@ -1180,6 +1278,7 @@ def ramp_fit_compute_variances(ramp_data, gain_2d, readnoise_2d, fit_slopes_ans)
 
         var_p4[num_int, :, med_rates <= 0.0] = 0.0
         var_both4[num_int, :, :, :] = var_r4[num_int, :, :, :] + var_p4[num_int, :, :, :]
+
         inv_var_both4[num_int, :, :, :] = 1.0 / var_both4[num_int, :, :, :]
 
         # Want to retain values in the 4D arrays only for the segments that each
@@ -1323,9 +1422,9 @@ def ramp_fit_overall(
     var_p3, var_r3, var_p4, var_r4, var_both4, var_both3 = variances_ans[:6]
     inv_var_both4, s_inv_var_p3, s_inv_var_r3, s_inv_var_both3 = variances_ans[6:]
 
-    slope_by_var4 = opt_res.slope_seg.copy() / var_both4
+    # import ipdb; ipdb.set_trace()
 
-    del var_both4
+    slope_by_var4 = opt_res.slope_seg.copy() / var_both4
 
     s_slope_by_var3 = slope_by_var4.sum(axis=1)  # sum over segments (not integs)
     s_slope_by_var2 = s_slope_by_var3.sum(axis=0)  # sum over integrations
@@ -1368,6 +1467,8 @@ def ramp_fit_overall(
 
     slope_int = the_num / the_den
 
+    del var_both4
+
     # Adjust DQ flags for NaNs.
     wh_nans = np.isnan(slope_int)
     dq_int[wh_nans] = np.bitwise_or(dq_int[wh_nans], ramp_data.flags_do_not_use)
@@ -1953,6 +2054,7 @@ def fit_next_segment(
         )
 
     # CASE: Long enough (semiramp has >2 groups), at end of ramp
+    # XXX The comments says semiramp >2, but checks for length >1.
     wh_check = np.where((l_interval > 1) & (end_locs == ngroups - 1) & (~pixel_done))
     if len(wh_check[0]) > 0:
         f_max_seg = fit_next_segment_long_end_of_ramp(
@@ -2917,6 +3019,9 @@ def fit_short_ngroups(
     ramp_mask_sum : ndarray
         number of channels to fit for each pixel, 1-D int
 
+    ramp_data : RampData
+        The ramp data needed for processing, specifically flag values.
+
     Returns
     -------
     f_max_seg : int
@@ -3129,6 +3234,8 @@ def fit_lines(data, mask_2d, rn_sect, gain_sect, ngroups, weighting, gdq_sect_r,
 
         slope, intercept, sig_slope, sig_intercept = calc_opt_fit(nreads_wtd, sumxx, sumx, sumxy, sumy)
 
+        # import ipdb; ipdb.set_trace()
+
         slope = slope / ramp_data.group_time
 
         variance = sig_slope**2.0  # variance due to fit values
@@ -3894,6 +4001,7 @@ def calc_opt_sums(ramp_data, rn_sect, gain_sect, data_masked, mask_2d, xvalues,
     # get final valid group for each pixel for this semiramp
     ind_lastnz = fnz + mask_2d_sum - 1
 
+
     # get SCI value of initial good group for semiramp
     data_zero = data_masked[fnz, range(data_masked.shape[1])]
     fnz = 0
@@ -3941,7 +4049,6 @@ def calc_opt_sums(ramp_data, rn_sect, gain_sect, data_masked, mask_2d, xvalues,
     num_nz = c_mask_2d.sum(0)  # number of groups in segment
     nrd_prime = (num_nz - 1) / 2.0
     num_nz = 0
-
     # Calculate inverse read noise^2 for use in weights
     # Suppress, then re-enable, harmless arithmetic warning
     warnings.filterwarnings("ignore", ".*divide by zero.*", RuntimeWarning)
@@ -3967,6 +4074,8 @@ def calc_opt_sums(ramp_data, rn_sect, gain_sect, data_masked, mask_2d, xvalues,
         xvalues[:, wh_m2d_f] = np.roll(xvalues[:, wh_m2d_f], -1, axis=0)
         wh_m2d_f = np.logical_not(c_mask_2d[0, :])
 
+    # import ipdb; ipdb.set_trace()
+
     # Create weighted sums for Poisson noise and read noise
     nreads_wtd = (wt_h * c_mask_2d).sum(axis=0)  # using optimal weights
 

diff --git a/src/stcal/ramp_fitting/ramp_fit_class.py b/src/stcal/ramp_fitting/ramp_fit_class.py
@@ -41,6 +41,8 @@ def __init__(self):
 
         self.current_integ = -1
 
+        self.dbg_run_c_code = False
+
     def set_arrays(self, data, err, groupdq, pixeldq):
         """
         Set the arrays needed for ramp fitting.
@@ -173,14 +175,18 @@ def dbg_print_basic_info(self):
 
         print(f"Shape : {self.data.shape}")
         print(f"data : \n{self.data}")
-        print(f"err : \n{self.err}")
         print(f"groupdq : \n{self.groupdq}")
-        print(f"pixeldq : \n{self.pixeldq}")
+        # print(f"err : \n{self.err}")
+        # print(f"pixeldq : \n{self.pixeldq}")
         print("-" * 80)
 
     def dbg_print_pixel_info(self, row, col):
         print("-" * 80)
-        print(f"    data :\n{self.data[:, :, row, col]}")
-        print(f"    err :\n{self.err[:, :, row, col]}")
-        print(f"    groupdq :\n{self.groupdq[:, :, row, col]}")
-        print(f"    pixeldq :\n{self.pixeldq[row, col]}")
+        print(f"    data")
+        for integ in range(self.data.shape[0]):
+            print(f"[{integ}] {self.data[integ, :, row, col]}")
+        print(f"    groupdq")
+        for integ in range(self.data.shape[0]):
+            print(f"[{integ}] {self.groupdq[integ, :, row, col]}")
+        # print(f"    err :\n{self.err[:, :, row, col]}")
+        # print(f"    pixeldq :\n{self.pixeldq[row, col]}")