Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Base unit from k$\lambda$ to $\lambda$ #249

Merged
merged 25 commits into from
Jan 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
bdc5c2d
GridCoords constructor with props and cached_props
kadri-nizam Apr 5, 2023
a8f7c6f
Merge branch 'MPoL-dev:main' into #152-coordinates
kadri-nizam Apr 5, 2023
8391981
GridCoords repr and fix coordinates test
kadri-nizam Apr 5, 2023
e8a42bd
Fix wrong variable
kadri-nizam Apr 5, 2023
99f4058
Fix missing bracket and add meshgrid vs tile test
kadri-nizam Apr 5, 2023
d648ac9
Update src/mpol/coordinates.py
kadri-nizam Apr 6, 2023
63b9b77
Update uv variable names
kadri-nizam Apr 6, 2023
411fa18
Merge branch '#152-coordinates' of github.com:kadri-nizam/MPol into #…
kadri-nizam Apr 6, 2023
d3f11b7
starting changelog for lambda.
iancze Dec 30, 2023
7cd7631
simplified tests to take only baselines where needed.
iancze Dec 31, 2023
f6c6927
added source baselines and img for mock data.
iancze Dec 31, 2023
53ec3ad
removed convert baselines closes #227
iancze Jan 1, 2024
7e55d0f
tests pass in intermediate fake dataset state.
iancze Jan 1, 2024
dda51cd
updated types and coverage.
iancze Jan 1, 2024
0de30ac
disabling residual plot for now.
iancze Jan 1, 2024
a4d8572
full mypy coverage for core routines.
iancze Jan 1, 2024
969c4b6
renamed SimpleNet to GriddedNet, tests pass locally.
iancze Jan 1, 2024
caa27cc
rewrote some mock data using butterfly image and passed some more tests.
iancze Jan 4, 2024
ed65754
commenting out TrainTest and CrossVal for now.
iancze Jan 5, 2024
1532251
tests pass with new mock data.
iancze Jan 5, 2024
ba2b39a
tests pass locally after base unit change.
iancze Jan 5, 2024
600b478
another spot to convert.
iancze Jan 5, 2024
8aa609f
updated to new resid funcitonality.
iancze Jan 5, 2024
e5e6d8b
resolved merge conflicts.
iancze Jan 5, 2024
acfbbd9
Merge pull request #250 from MPoL-dev/coordinates
iancze Jan 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
graph TD
subgraph SimpleNet
subgraph GriddedNet
bc(BaseCube) --> HannConvCube
HannConvCube --> ImageCube
ImageCube --> FourierLayer
Expand Down
27 changes: 24 additions & 3 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,30 @@
# Changelog

## v0.3.0

- Standardized nomenclature of {class}`mpol.coordinates.GridCoords` and {class}`mpol.fourier.FourierCube` to use `sky_cube` for a normal image and `ground_cube` for a normal visibility cube (rather than `sky_` for visibility quantities). Routines use `packed_cube` instead of `cube` internally to be clear when packed format is preferred.
- Modified {class}`mpol.coordinates.GridCoords` object to use cached properties [#187](https://github.com/MPoL-dev/MPoL/pull/187).
- Changed the base spatial frequency unit from k$\lambda$ to $\lambda$, addressing [#223](https://github.com/MPoL-dev/MPoL/issues/223). This will affect most users data-reading routines!
- Added the {meth}`mpol.gridding.DirtyImager.from_tensors` routine to cover the use case where one might want to use the {meth}`mpol.gridding.DirtyImager` to image residual visibilities. Otherwise, {meth}`mpol.gridding.DirtyImager` and {meth}`mpol.gridding.DataAverager` are the only notable routines that expect `np.ndarray` input arrays. This is because they are designed to work with data arrays directly after loading (say from a MeasurementSet or `.npy` file) and are implemented internally in numpy. If a routine requires data separately as `data_re` and `data_im`, that is a tell-tale sign that the routine works with numpy histogram routines internally.
- Changed name of {class}`mpol.precomposed.SimpleNet` to {class}`mpol.precomposed.GriddedNet` to more clearly indicate purpose. Updated documentation to make clear that this is just a convenience starter module, and users are encouraged to write their own `nn.Module`s.
- Changed internal instance attribute of {class}`mpol.images.ImageCube` from `cube` to `packed_cube` to more clearly indicate format.
- Removed `mpol.fourier.get_vis_residuals` and added `predict_loose_visibilities` to {class}`mpol.precomposed.SimpleNet`.
- Standardized treatment of numpy vs `torch.tensor`s, with preference for `torch.tensor` in many routines. This simplifies the internal logic of the routines and will make most operations run faster.
- Standardized the input types of {class}:`mpol.fourier.NuFFT` and {class}:`mpol.fourier.NuFFTCached` to expect {class}`torch.Tensor`s (removed support for numpy arrays). This simplifies the internal logic of the routines and will make most operations run faster.
- Changed {class}`mpol.fourier.make_fake_data` -> {class}`mpol.fourier.generate_fake_data`.
- Changed base spatial frequency unit from k$\lambda$ to $\lambda$, closing issue [#223](https://github.com/MPoL-dev/MPoL/issues/223) and simplifying the internals of the codebase in numerous places. The following routines now expect inputs in units of $\lambda$:
- {class}`mpol.coordinates.GridCoords`
- {class}`mpol.coordinates.check_data_fit`
- {class}`mpol.datasets.GriddedDataset`
- {class}`mpol.fourier.NuFFT.forward`
- {class}`mpol.fourier.NuFFTCached`
- {class}`mpol.gridding.verify_no_hermitian_pairs`
- {class}`mpol.gridding.GridderBase`
- {class}`mpol.gridding.DataAverager`
- {class}`mpol.gridding.DirtyImager`
- Major documentation edits to be more concise with the objective of making the core package easier to develop and maintain. Some tutorials moved to the [MPoL-dev/examples](https://github.com/MPoL-dev/examples) repository.
- Added the {meth}`mpol.losses.neg_log_likelihood_avg` method to be used in point-estimate or optimization situations where data amplitudes or weights may be adjusted as part of the optimization (such as via self-calibration). Moved all documentation around loss functions into the [Losses API](api/losses.md).
- Renamed `mpol.losses.nll` -> {meth}`mpol.losses.r_chi_squared` and `mpol.losses.nll_gridded` -> {meth}`mpol.losses.r_chi_squared_gridded` because that is what those routines were previously calculating (see the {ref}`api-reference-label` for more details). ([#237](https://github.com/MPoL-dev/MPoL/issues/237)). Tutorials have also been updated to reflect the change.
- Renamed `mpol.losses.nll` -> {meth}`mpol.losses.r_chi_squared` and `mpol.losses.nll_gridded` -> {meth}`mpol.losses.r_chi_squared_gridded` because that is what those routines were previously calculating. ([#237](https://github.com/MPoL-dev/MPoL/issues/237)). Tutorials have also been updated to reflect the change.
- Fixed implementation and docstring of {meth}`mpol.losses.log_likelihood` ([#237](https://github.com/MPoL-dev/MPoL/issues/237)).
- Made some progress converting docstrings from "Google" style format to "NumPy" style format. Ian is now convinced that NumPy style format is more readable for the type of docstrings we write in MPoL. We usually require long type definitions and long argument descriptions, and the extra indentation required for Google makes these very scrunched.
- Make the `passthrough` behaviour of {class}`mpol.images.ImageCube` the default and removed this parameter entirely. Previously, it was possible to have {class}`mpol.images.ImageCube` act as a layer with `nn.Parameter`s. This functionality has effectively been replaced since the introduction of {class}`mpol.images.BaseCube` which provides a more useful way to parameterize pixel values. If a one-to-one mapping (including negative pixels) from `nn.Parameter`s to output tensor is desired, then one can specify `pixel_mapping=lambda x : x` when instantiating {class}`mpol.images.BaseCube`. More details in ([#246](https://github.com/MPoL-dev/MPoL/issues/246))
Expand All @@ -28,7 +49,7 @@
- TOML does not support adding keyed entries, so creating layered build environments of default, `docs`, `test`, and `dev` as we used to with `setup.py` is laborious and repetitive with `pyproject.toml`. We have simplified the list to be default (key dependencies), `test` (minimal necessary for test-suite), and `dev` (covering everything needed to build the docs and actively develop the package).
- Removed custom `spheroidal_gridding` routines, tests, and the `UVDataset` object that used them. These have been superseded by the TorchKbNuFFT package. For reference, the old routines (including the tricky `corrfun` math) is preserved in a Gist [here](https://gist.github.com/iancze/f3d2769005a9e2c6731ee6977f166a83).
- Changed API of {class}`~mpol.fourier.NuFFT`. Previous signature took `uu` and `vv` points at initialization (`__init__`), and the `.forward` method took only an image cube. This behaviour is preserved in a new class {class}`~mpol.fourier.NuFFTCached`. The updated signature of {class}`~mpol.fourier.NuFFT` *does not* take `uu` and `vv` at initialization. Rather, its `forward` method is modified to take an image cube and the `uu` and `vv` points. This allows an instance of this class to be used with new `uu` and `vv` points in each forward call. This follows the standard expectation of a layer (e.g., a linear regression function predicting at new `x`) and the pattern of the TorchKbNuFFT package itself. It is expected that the new `NuFFT` will be the default routine and `NuFFTCached` will only be used in specialized circumstances (and possibly deprecated/removed in future updates). Changes implemented by [#232](https://github.com/MPoL-dev/MPoL/pull/232).
- Moved "Releasing a new version of MPoL" from the wiki to the Developer Documentation ({ref}`releasing-new-version-label`).
- Moved "Releasing a new version of MPoL" from the wiki to the Developer Documentation on the main docs.

## v0.2.0

Expand All @@ -38,7 +59,7 @@
- Reorganized some of the docs API
- Expanded discussion and demonstration in `optimzation.md` tutorial
- Localized harcoded Zenodo record reference to single instance, and created new external Zenodo record from which to draw
- Added [Parametric inference with Pyro tutorial](large-tutorials/pyro.md)
- Added Parametric inference with Pyro tutorial
- Updated some discussion and notation in `rml_intro.md` tutorial
- Added `mypy` static type checks
- Added `frank` as a 'test' and 'analysis' extras dependency
Expand Down
6 changes: 2 additions & 4 deletions docs/ci-tutorials/crossvalidation.md
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ def cross_validate(config):
for k_fold, (train_dset, test_dset) in enumerate(k_fold_datasets):

# create a new model and optimizer for this k_fold
rml = precomposed.SimpleNet(coords=coords, nchan=train_dset.nchan)
rml = precomposed.GriddedNet(coords=coords, nchan=train_dset.nchan)
optimizer = torch.optim.Adam(rml.parameters(), lr=config["lr"])

# train for a while
Expand All @@ -310,7 +310,7 @@ Finally, we'll write one more function to train the model using the full dataset

```{code-cell}
def train_and_image(pars):
rml = precomposed.SimpleNet(coords=coords, nchan=dset.nchan)
rml = precomposed.GriddedNet(coords=coords, nchan=dset.nchan)
optimizer = torch.optim.Adam(rml.parameters(), lr=pars["lr"])
writer = SummaryWriter()
train(rml, dset, pars, optimizer, writer=writer)
Expand All @@ -324,8 +324,6 @@ def train_and_image(pars):
return fig, ax
```

All of the method presented here can be sped up using GPU acceleration on certain Nvidia GPUs. To learn more about this, please see the {ref}`GPU Setup Tutorial <gpu-reference-label>`.

+++

## Results
Expand Down
21 changes: 9 additions & 12 deletions docs/ci-tutorials/fakedata.md
Original file line number Diff line number Diff line change
Expand Up @@ -288,9 +288,9 @@ fname = download_file(
# select the components for a single channel
chan = 4
d = np.load(fname)
uu = d["uu"][chan]
vv = d["vv"][chan]
weight = d["weight"][chan]
uu = torch.as_tensor(d["uu"][chan])
vv = torch.as_tensor(d["vv"][chan])
weight = torch.as_tensor(d["weight"][chan])
```

MPoL has a helper routine to calculate the maximum `cell_size` that can still Nyquist sample the highest spatial frequency in the baseline distribution.
Expand All @@ -306,12 +306,12 @@ Thankfully, we see that we already chose a sufficiently small `cell_size`.

## Making the mock dataset

With the {class}`~mpol.images.ImageCube`, $u,v$ and weight distributions now in hand, generating the mock visibilities is relatively straightforward using the {func}`mpol.fourier.make_fake_data` routine. This routine uses the {class}`~mpol.fourier.NuFFT` to produce loose visibilities at the $u,v$ locations and then adds random Gaussian noise to the visibilities, drawn from a probability distribution set by the value of the weights.
With the {class}`~mpol.images.ImageCube`, $u,v$ and weight distributions now in hand, generating the mock visibilities is relatively straightforward using the {func}`mpol.fourier.generate_fake_data` routine. This routine uses the {class}`~mpol.fourier.NuFFT` to produce loose visibilities at the $u,v$ locations and then adds random Gaussian noise to the visibilities, drawn from a probability distribution set by the value of the weights.

```{code-cell} ipython3
from mpol import fourier
# will have the same shape as the uu, vv, and weight inputs
data_noise, data_noiseless = fourier.make_fake_data(image, uu, vv, weight)
data_noise, data_noiseless = fourier.generate_fake_data(img_tensor_packed, coords, uu, vv, weight)

print(data_noise.shape)
print(data_noiseless.shape)
Expand All @@ -337,22 +337,19 @@ To make sure the whole process worked OK, we'll load the visibilities and then m
```{code-cell} ipython3
from mpol import coordinates, gridding

# well set the
coords = coordinates.GridCoords(cell_size=cell_size, npix=npix)

imager = gridding.DirtyImager(
imager = gridding.DirtyImager.from_tensors(
coords=coords,
uu=uu,
vv=vv,
weight=weight,
data_re=np.squeeze(np.real(data)),
data_im=np.squeeze(np.imag(data)),
)
data=data)
```

```{code-cell} ipython3
C = 1 / np.sum(weight)
noise_estimate = C * np.sqrt(np.sum(weight))
C = 1 / torch.sum(weight)
noise_estimate = C * torch.sqrt(torch.sum(weight))
print(noise_estimate, "Jy / dirty beam")
```

Expand Down
2 changes: 1 addition & 1 deletion docs/ci-tutorials/gridder.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ data_im = np.imag(data)

## Plotting the data

Following some of the exercises in the [visread documentation](https://mpol-dev.github.io/visread/tutorials/introduction_to_casatools.html), let's plot up the baseline distribution and get a rough look at the raw visibilities. For more information on these data types, we recommend you read the [Introduction to RML Imaging](../rml_intro.md).
Following some of the exercises in the [visread documentation](https://mpol-dev.github.io/visread/tutorials/introduction_to_casatools.html), let's plot up the baseline distribution and get a rough look at the raw visibilities.

Note that the `uu`, `vv`, `weight`, `data_re`, and `data_im` arrays are all two-dimensional numpy arrays of shape `(nchan, nvis)`. This is because MPoL has the capacity to image spectral line observations. MPoL will absolutely still work with single-channel continuum data, you will just need to work with 2D arrays of shape `(1, nvis)`.

Expand Down
4 changes: 2 additions & 2 deletions docs/ci-tutorials/initializedirtyimage.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ Here we set the optimizer and the image model (RML). If this is unfamiliar pleas

```{code-cell}
dirty_image = torch.tensor(img.copy()) # turns it into a pytorch tensor
rml = precomposed.SimpleNet(coords=coords, nchan=dset.nchan)
rml = precomposed.GriddedNet(coords=coords, nchan=dset.nchan)
optimizer = torch.optim.SGD(
rml.parameters(), lr=1000.0
) # multiple different possiple optimizers
Expand Down Expand Up @@ -205,7 +205,7 @@ For more information on saving and loading models in PyTorch, please consult the
Now let's assume we're about to start an optimization loop in a new file, and we've just created a new model.

```{code-cell}
rml = precomposed.SimpleNet(coords=coords)
rml = precomposed.GriddedNet(coords=coords)
rml.state_dict() # the now uninitialized parameters of the model (the ones we started with)
```

Expand Down
Loading
Loading