diff --git a/docs/ci-tutorials/initializedirtyimage.md b/docs/ci-tutorials/initializedirtyimage.md index 71cc651e..c62b5a37 100644 --- a/docs/ci-tutorials/initializedirtyimage.md +++ b/docs/ci-tutorials/initializedirtyimage.md @@ -212,7 +212,7 @@ rml.state_dict() # the now uninitialized parameters of the model (the ones we s Here you can clearly see the ``state_dict`` is in its original state, before the training loop changed the parameters through the optimization function. Loading our saved dirty image state into the model is as simple as ```{code-cell} -rml.load_state_dict(torch.load("dirty_image_model.pt")) +rml.load_state_dict(torch.load("dirty_image_model.pt"), strict=False) rml.state_dict() # the reloaded parameters of the model ``` diff --git a/docs/ci-tutorials/loose-visibilities.md b/docs/ci-tutorials/loose-visibilities.md index 1b9c4189..31e2dd58 100644 --- a/docs/ci-tutorials/loose-visibilities.md +++ b/docs/ci-tutorials/loose-visibilities.md @@ -154,8 +154,6 @@ img_packed = utils.sky_gaussian_arcsec( img_packed_cube = np.broadcast_to(img_packed, (nchan, coords.npix, coords.npix)).copy() # convert img_packed to pytorch tensor img_packed_tensor = torch.from_numpy(img_packed_cube) -# insert into ImageCube layer -icube = images.ImageCube(coords=coords, nchan=nchan, cube=img_packed_tensor) ``` ## Producing model visibilities @@ -163,7 +161,7 @@ icube = images.ImageCube(coords=coords, nchan=nchan, cube=img_packed_tensor) The interesting part of the NuFFT is that it will carry an image plane model all the way to the Fourier plane in loose visibilities, resulting in a model visibility array the same shape as the original visibility data. ```{code-cell} -vis_model_loose = nufft(icube(), uu_chan, vv_chan) +vis_model_loose = nufft(img_packed_tensor, uu_chan, vv_chan) print("Loose model visibilities from the NuFFT have shape {:}".format(vis_model_loose.shape)) print("The original loose data visibilities have shape {:}".format(data.shape)) ``` @@ -171,7 +169,7 @@ print("The original loose data visibilities have shape {:}".format(data.shape)) By comparison, the {class}`~mpol.gridding.Gridder` object puts the visibilities onto a grid and exports a {class}`~mpol.datasets.GriddedDataset` object. These gridded data visibilities have the same dimensionality as the gridded model visibilities produced by the {class}`~mpol.fourier.FourierCube` layer ```{code-cell} -vis_model_gridded = flayer(icube()) +vis_model_gridded = flayer(img_packed_tensor) print("Gridded model visibilities from FourierCube have shape {:}".format(vis_model_gridded.shape)) print("Gridded data visibilities have shape {:}".format(gridded_dset.vis_gridded.shape)) ``` @@ -273,18 +271,18 @@ Now we'll evaluate the likelihood function using both the loose visibilities pro data_loose = torch.tensor(data) weight_loose = torch.tensor(weight) -chisquare = losses.chi_squared(vis_model_loose, data_loose, weight_loose) +r_chisquare = losses.r_chi_squared(vis_model_loose, data_loose, weight_loose) loglike = losses.log_likelihood(vis_model_loose, data_loose, weight_loose) -print("Chi squared", chisquare) +print("Reduced Chi squared", r_chisquare) print("Log likelihood", loglike) ``` ### Gridded visibility log likelihood ```{code-cell} -chisquare_gridded = losses.chi_squared_gridded(vis_model_gridded, gridded_dset) +r_chisquare_gridded = losses.r_chi_squared_gridded(vis_model_gridded, gridded_dset) loglike_gridded = losses.log_likelihood_gridded(vis_model_gridded, gridded_dset) -print("Chi squared gridded", chisquare_gridded) +print("Reduced Chi squared gridded", r_chisquare_gridded) print("Log likelihood gridded", loglike_gridded) ``` @@ -320,16 +318,16 @@ $$ This is the same reasoning that gives rise to the statistical apothegm "a well-fit model has a reduced $\chi^2_R \approx 1$" (one that has [many caveats and pitfalls](https://arxiv.org/abs/1012.3754)). In this case the extra factor of 2 in the denominator comes about because the visibility data and its noise are complex-valued. -The hope is that for many applications, the normalized negative log likelihood loss function will have a minimum value of $L(\hat{\boldsymbol{\theta}}) \approx 1$ for a well-fit model (regardless of the number of data points), making it easier to set the regularizer strengths *relative* to this value. Note that even this normalized loss won't be the same between an unbinned and binned dataset, though hopefully both will be on the order of $1$. +The hope is that for many applications, the reduced $\chi^2_R$ loss function will have a minimum value of $$\chi^2_R(\hat{\boldsymbol{\theta}}) \approx 1$ for a well-fit model (regardless of the number of data points), making it easier to set the regularizer strengths *relative* to this value. Note that even this normalized loss won't be the same between an unbinned and binned dataset, though hopefully both will be on the order of $1$. -This loss function is implemented in {func}`mpol.losses.nll` and {func}`mpol.losses.nll_gridded`, and we can see the results here. +This loss function is implemented in {func}`mpol.losses.r_chi_squared` and {func}`mpol.losses.r_chi_squared_gridded`, and we can see the results here. ```{code-cell} -nll = losses.nll(vis_model_loose, data_loose, weight_loose) -print("Normalized log likelihood", nll) +r_chi2 = losses.r_chi_squared(vis_model_loose, data_loose, weight_loose) +print("Reduced chi squared", r_chi2) ``` ```{code-cell} -nll_gridded = losses.nll_gridded(vis_model_gridded, gridded_dset) -print("Normalized log likelihood gridded", nll_gridded) +r_chi2_gridded = losses.r_chi_squared_gridded(vis_model_gridded, gridded_dset) +print("Reduced chi squared gridded", r_chi2_gridded) ``` diff --git a/docs/ci-tutorials/optimization.md b/docs/ci-tutorials/optimization.md index b95bd63c..b59a83a5 100644 --- a/docs/ci-tutorials/optimization.md +++ b/docs/ci-tutorials/optimization.md @@ -183,7 +183,7 @@ But, exciting things are about to happen! We can calculate the loss between thes ```{code-cell} ipython3 # calculate a loss -loss = losses.nll_gridded(vis, dset) +loss = losses.r_chi_squared_gridded(vis, dset) print(loss.item()) ``` @@ -255,7 +255,7 @@ for i in range(300): vis = rml() # calculate a loss - loss = losses.nll_gridded(vis, dset) + loss = losses.r_chi_squared_gridded(vis, dset) loss_tracker.append(loss.item()) @@ -318,8 +318,8 @@ nufft = fourier.NuFFT(coords=coords, nchan=dset.nchan) and then we can calculate model visibilities corresponding to some model image (in this case, our optimal image). Since {meth}`mpol.fourier.NuFFT.forward` returns a Pytorch tensor, we'll need to detach it and convert it to numpy. We'll also remove the channel dimension. ```{code-cell} ipython3 -# note the NuFFT expects a "packed image cube," as output from ImageCube.forward() -vis_model = nufft(rml.icube(), uu, vv) +# note the NuFFT expects a "packed image cube," as stored in ImageCube.cube +vis_model = nufft(rml.icube.cube, uu, vv) # convert from Pytorch to numpy, remove channel dimension vis_model = np.squeeze(vis_model.detach().numpy()) ``` diff --git a/src/mpol/losses.py b/src/mpol/losses.py index 223ffbb4..e1193636 100644 --- a/src/mpol/losses.py +++ b/src/mpol/losses.py @@ -6,18 +6,22 @@ We expect you might use the loss functions in the following scenarios: -- *optimizing a model with fixed data amplitudes and weights* this is the most common -scenario for RML imaging. Use :meth:`~mpol.losses.r_chi_squared` for loose visibilities -and :meth:`~mpol.losses.r_chi_squared_gridded` for gridded quantities -(e.g., :class:`~mpol.datasets.GriddedDataset`). -- *optimizing a model with potential adjustments to data amplitudes and weights*: you -will need to do this in a self-calibration workflow. Use -:meth:`~mpol.losses.neg_log_likelihood_avg` for loose visibilities. The nature of the -adjustments makes gridding inefficient, so there is no gridded version of this loss. -- *doing Bayesian inference*: if you are calculating a log likelihood to use with MCMC -sampling, for example, use :meth:`~mpol.losses.log_likelihood` or -:meth:`~mpol.losses.log_likelihood_gridded`. (Pyro defines its own likelihood function -through the probabilistic programming language, so this is not needed in that case). +* optimizing a model with fixed data amplitudes and weights: + this is the most common + scenario for RML imaging. Use :meth:`~mpol.losses.r_chi_squared` for loose + visibilities and :meth:`~mpol.losses.r_chi_squared_gridded` for gridded quantities + (e.g., :class:`~mpol.datasets.GriddedDataset`). +* optimizing a model with potential adjustments to data amplitudes and weights: + you will need to do this in a self-calibration workflow. Use + :meth:`~mpol.losses.neg_log_likelihood_avg` for loose visibilities. The nature of + the adjustments makes gridding inefficient, so there is no gridded version of this + loss. +* doing Bayesian inference: + if you are calculating a log likelihood to use with MCMC + sampling, for example, use :meth:`~mpol.losses.log_likelihood` or + :meth:`~mpol.losses.log_likelihood_gridded`. (Pyro defines its own likelihood + function through the probabilistic programming language, so this is not needed in + that case). """ import numpy as np