diff --git a/docs/ci-tutorials/initializedirtyimage.md b/docs/ci-tutorials/initializedirtyimage.md
index 71cc651e..c62b5a37 100644
--- a/docs/ci-tutorials/initializedirtyimage.md
+++ b/docs/ci-tutorials/initializedirtyimage.md
@@ -212,7 +212,7 @@ rml.state_dict()  # the now uninitialized parameters of the model (the ones we s
 Here you can clearly see the ``state_dict`` is in its original state, before the training loop changed the parameters through the optimization function. Loading our saved dirty image state into the model is as simple as
 
 ```{code-cell}
-rml.load_state_dict(torch.load("dirty_image_model.pt"))
+rml.load_state_dict(torch.load("dirty_image_model.pt"), strict=False)
 rml.state_dict()  # the reloaded parameters of the model
 ```
 
diff --git a/docs/ci-tutorials/loose-visibilities.md b/docs/ci-tutorials/loose-visibilities.md
index 1b9c4189..31e2dd58 100644
--- a/docs/ci-tutorials/loose-visibilities.md
+++ b/docs/ci-tutorials/loose-visibilities.md
@@ -154,8 +154,6 @@ img_packed = utils.sky_gaussian_arcsec(
 img_packed_cube = np.broadcast_to(img_packed, (nchan, coords.npix, coords.npix)).copy()
 # convert img_packed to pytorch tensor
 img_packed_tensor = torch.from_numpy(img_packed_cube)
-# insert into ImageCube layer
-icube = images.ImageCube(coords=coords, nchan=nchan, cube=img_packed_tensor)
 ```
 
 ## Producing model visibilities
@@ -163,7 +161,7 @@ icube = images.ImageCube(coords=coords, nchan=nchan, cube=img_packed_tensor)
 The interesting part of the NuFFT is that it will carry an image plane model all the way to the Fourier plane in loose visibilities, resulting in a model visibility array the same shape as the original visibility data.
 
 ```{code-cell}
-vis_model_loose = nufft(icube(), uu_chan, vv_chan)
+vis_model_loose = nufft(img_packed_tensor, uu_chan, vv_chan)
 print("Loose model visibilities from the NuFFT have shape {:}".format(vis_model_loose.shape))
 print("The original loose data visibilities have shape {:}".format(data.shape))
 ```
@@ -171,7 +169,7 @@ print("The original loose data visibilities have shape {:}".format(data.shape))
 By comparison, the {class}`~mpol.gridding.Gridder` object puts the visibilities onto a grid and exports a {class}`~mpol.datasets.GriddedDataset` object. These gridded data visibilities have the same dimensionality as the gridded model visibilities produced by the {class}`~mpol.fourier.FourierCube` layer
 
 ```{code-cell}
-vis_model_gridded = flayer(icube())
+vis_model_gridded = flayer(img_packed_tensor)
 print("Gridded model visibilities from FourierCube have shape {:}".format(vis_model_gridded.shape))
 print("Gridded data visibilities have shape {:}".format(gridded_dset.vis_gridded.shape))
 ```
@@ -273,18 +271,18 @@ Now we'll evaluate the likelihood function using both the loose visibilities pro
 data_loose = torch.tensor(data)
 weight_loose = torch.tensor(weight)
 
-chisquare = losses.chi_squared(vis_model_loose, data_loose, weight_loose)
+r_chisquare = losses.r_chi_squared(vis_model_loose, data_loose, weight_loose)
 loglike = losses.log_likelihood(vis_model_loose, data_loose, weight_loose)
-print("Chi squared", chisquare)
+print("Reduced Chi squared", r_chisquare)
 print("Log likelihood", loglike)
 ```
 
 ### Gridded visibility log likelihood
 
 ```{code-cell}
-chisquare_gridded = losses.chi_squared_gridded(vis_model_gridded, gridded_dset)
+r_chisquare_gridded = losses.r_chi_squared_gridded(vis_model_gridded, gridded_dset)
 loglike_gridded = losses.log_likelihood_gridded(vis_model_gridded, gridded_dset)
-print("Chi squared gridded", chisquare_gridded)
+print("Reduced Chi squared gridded", r_chisquare_gridded)
 print("Log likelihood gridded", loglike_gridded)
 ```
 
@@ -320,16 +318,16 @@ $$
 
 This is the same reasoning that gives rise to the statistical apothegm "a well-fit model has a reduced $\chi^2_R \approx 1$" (one that has [many caveats and pitfalls](https://arxiv.org/abs/1012.3754)). In this case the extra factor of 2 in the denominator comes about because the visibility data and its noise are complex-valued.
 
-The hope is that for many applications, the normalized negative log likelihood loss function will have a minimum value of $L(\hat{\boldsymbol{\theta}}) \approx 1$ for a well-fit model (regardless of the number of data points), making it easier to set the regularizer strengths *relative* to this value. Note that even this normalized loss won't be the same between an unbinned and binned dataset, though hopefully both will be on the order of $1$.
+The hope is that for many applications, the reduced $\chi^2_R$ loss function will have a minimum value of $$\chi^2_R(\hat{\boldsymbol{\theta}}) \approx 1$ for a well-fit model (regardless of the number of data points), making it easier to set the regularizer strengths *relative* to this value. Note that even this normalized loss won't be the same between an unbinned and binned dataset, though hopefully both will be on the order of $1$.
 
-This loss function is implemented in {func}`mpol.losses.nll` and {func}`mpol.losses.nll_gridded`, and we can see the results here.
+This loss function is implemented in {func}`mpol.losses.r_chi_squared` and {func}`mpol.losses.r_chi_squared_gridded`, and we can see the results here.
 
 ```{code-cell}
-nll = losses.nll(vis_model_loose, data_loose, weight_loose)
-print("Normalized log likelihood", nll)
+r_chi2 = losses.r_chi_squared(vis_model_loose, data_loose, weight_loose)
+print("Reduced chi squared", r_chi2)
 ```
 
 ```{code-cell}
-nll_gridded = losses.nll_gridded(vis_model_gridded, gridded_dset)
-print("Normalized log likelihood gridded", nll_gridded)
+r_chi2_gridded = losses.r_chi_squared_gridded(vis_model_gridded, gridded_dset)
+print("Reduced chi squared gridded", r_chi2_gridded)
 ```
diff --git a/docs/ci-tutorials/optimization.md b/docs/ci-tutorials/optimization.md
index b95bd63c..b59a83a5 100644
--- a/docs/ci-tutorials/optimization.md
+++ b/docs/ci-tutorials/optimization.md
@@ -183,7 +183,7 @@ But, exciting things are about to happen! We can calculate the loss between thes
 
 ```{code-cell} ipython3
 # calculate a loss
-loss = losses.nll_gridded(vis, dset)
+loss = losses.r_chi_squared_gridded(vis, dset)
 print(loss.item())
 ```
 
@@ -255,7 +255,7 @@ for i in range(300):
     vis = rml()
 
     # calculate a loss
-    loss = losses.nll_gridded(vis, dset)
+    loss = losses.r_chi_squared_gridded(vis, dset)
 
     loss_tracker.append(loss.item())
 
@@ -318,8 +318,8 @@ nufft = fourier.NuFFT(coords=coords, nchan=dset.nchan)
 and then we can calculate model visibilities corresponding to some model image (in this case, our optimal image). Since {meth}`mpol.fourier.NuFFT.forward` returns a Pytorch tensor, we'll need to detach it and convert it to numpy. We'll also remove the channel dimension.
 
 ```{code-cell} ipython3
-# note the NuFFT expects a "packed image cube," as output from ImageCube.forward()
-vis_model = nufft(rml.icube(), uu, vv)
+# note the NuFFT expects a "packed image cube," as stored in ImageCube.cube
+vis_model = nufft(rml.icube.cube, uu, vv)
 # convert from Pytorch to numpy, remove channel dimension
 vis_model = np.squeeze(vis_model.detach().numpy())
 ```
diff --git a/src/mpol/losses.py b/src/mpol/losses.py
index 223ffbb4..e1193636 100644
--- a/src/mpol/losses.py
+++ b/src/mpol/losses.py
@@ -6,18 +6,22 @@
 
 We expect you might use the loss functions in the following scenarios:
 
-- *optimizing a model with fixed data amplitudes and weights* this is the most common 
-scenario for RML imaging. Use :meth:`~mpol.losses.r_chi_squared` for loose visibilities 
-and :meth:`~mpol.losses.r_chi_squared_gridded` for gridded quantities 
-(e.g., :class:`~mpol.datasets.GriddedDataset`).
-- *optimizing a model with potential adjustments to data amplitudes and weights*: you 
-will need to do this in a self-calibration workflow. Use 
-:meth:`~mpol.losses.neg_log_likelihood_avg` for loose visibilities. The nature of the 
-adjustments makes gridding inefficient, so there is no gridded version of this loss.
-- *doing Bayesian inference*: if you are calculating a log likelihood to use with MCMC 
-sampling, for example, use :meth:`~mpol.losses.log_likelihood` or 
-:meth:`~mpol.losses.log_likelihood_gridded`. (Pyro defines its own likelihood function 
-through the probabilistic programming language, so this is not needed in that case).
+* optimizing a model with fixed data amplitudes and weights:
+    this is the most common 
+    scenario for RML imaging. Use :meth:`~mpol.losses.r_chi_squared` for loose 
+    visibilities and :meth:`~mpol.losses.r_chi_squared_gridded` for gridded quantities 
+    (e.g., :class:`~mpol.datasets.GriddedDataset`).
+* optimizing a model with potential adjustments to data amplitudes and weights: 
+    you will need to do this in a self-calibration workflow. Use 
+    :meth:`~mpol.losses.neg_log_likelihood_avg` for loose visibilities. The nature of 
+    the adjustments makes gridding inefficient, so there is no gridded version of this 
+    loss.
+* doing Bayesian inference: 
+    if you are calculating a log likelihood to use with MCMC 
+    sampling, for example, use :meth:`~mpol.losses.log_likelihood` or 
+    :meth:`~mpol.losses.log_likelihood_gridded`. (Pyro defines its own likelihood 
+    function through the probabilistic programming language, so this is not needed in 
+    that case).
 """
 
 import numpy as np