You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or opportunity? Please describe.
For high sensitivity and/or spectral resolution applications, the spectral response function of the correlator system should be taken into account in the calculation of the negative log likelihood loss.
A proper understanding and treatment of this will be important for
parametric model fitting
combination of multiple execution blocks/programs at different spectral resolutions
high fidelity spectral line imaging, even for programs that have the same spectral resolutions
Relevant background information
Spectral resolution is achieved by calculating "lags" in the autocorrelation of the signal. Via the Wiener–Khinchin theorem, the Fourier transform of the autocorrelation function is the power spectrum. For more background, see TMS Ch 8 and Bracewell, Ch 6 "The Autocorrelation Theorem".
There are (at least) two types of correlators, "FX" and "XF," and they correspond to the order in which the Fourier transform (F) and cross correlation (X) are applied to the incoming data stream. The correlator for the ALMA main array is technically an "FXF" correlator or "filtering-correlation-Fourier transform" correlator that includes a digital filter and takes the best of both worlds, boosting spectral resolution up to a factor of 32 over a traditional XF correlator. We'll trust the comments in TMS 8.8.3 that the upgraded VLA correlator as well as the main ALMA correlator are fundamentally XF or "lag" correlators, at least at the fine resolutions corresponding to spectral line observations. In an XF correlator, the cross-correlation is done first (to get the cros-correlation signal) and then the Fourier transform is performed (to get the complex-valued cross-power spectrum (i.e., the visibility as a function of frequency)). More notes on the benefits of an XF vs FX design are available in TMS 8.8.5. The correlator for the ACA is an FX design, which operates differently. However, in practice, the spectral resolution of the ACA correlator is much higher than the 12m data, so a "profile synthesis" step is performed to match the spectral response function of the ACA data to that of the 12m data.
Because the maximum lag is finite, there will always exist a window function to the autocorrelation, whose Fourier transform manifests as a spectral response function in the power spectrum. Richard Hills' notes on spectral response contain good illustrations of this effect. The Hanning window is the default choice for ALMA main array observations, though the user can specify different options in their Phase 2 submission.
ALMA also provides an option to bin the spectrum in an on-line fashion by 2, 4, 8, or 16 channels to reduce the data rate and final data volume. While real-time data rate limits were a concern in earlier cycles, they seem to be less of an issue in current (2021) cycles. Because a Hann window applied to a Nyquist sampled data stream results in a spectrum with an effective spectral resolution (defined by FWHM) of two channels, an on-line binning of 2 results in a final set of channels that are (minimally) correlated.
Spectral resolution
Much of the discussion in the ALMA documentation surrounding these choices focuses on whether the final samples are correlated. Correlation is only one concern related to forward modeling the visibility samples, however. Even if the samples are uncorrelated, we still need to implement the effect of the spectral response when bringing a latent space model forward to the data. I.e., if the channels are 244kHz wide after 2x averaging, that doesn't mean we can sample a latent space model at 244kHz and call it a day. If the latent space model contains intrinsic spectral features at 32kHz, for example, we would first need to convolve the latent space model with the effective spectral response (the result of sampling, Hann windowing, and averaging) before making a comparison with the data. This is probably easiest to achieve with a convolution in the spectral/velocity domain i.e., using the appropriately sized Hann window SRF, or the one formed by channel averaging (see example in Richard Hills' notes on spectral response). It wouldn't necessarily be 3-points long, because the 32kHz latent space model is oversampled relative to the Nyquist rate of 244kHz. This should be straightforward to implement with PyTorch Conv1D with the stride set to the output channel spacing.
Spectral covariance
See TMS v3 Section 8.8.2 and Loomis+18 Appendix C for more details, but the processing of the model cube is likely just a simple 3 point digital filter that we could implement as a convolution. The covariance matrix of the data should reflect that same convolutional processing.
Describe the solution you'd like
The spectral covariance loss might be implemented as inheriting from torch.nn.Module in mpol.losses, much like loss functions in PyTorch. The covariance matrix (and any Cholesky factorization or other pre-caching) would be cached as part of the __init__ process, since the matrix itself is fixed.
If the covariance is coupled with addressing the Doppler setting, we will also need to consider #4, because the act of "gridding" a dataset will destroy time information.
Describe alternatives you've considered
For lower S/N datasets (and continuum datasets), ignoring this effect and using the GriddedDataset is adequate. But for high S/N applications, it'd be nice to get as much information out of the data as possible.
The text was updated successfully, but these errors were encountered:
The following is a proposed first-order solution to the spectral response issue.
Assumptions
All visibilities are sampled at the same spectral resolution, and that their Doppler corrections have been applied and interpolated to a set of common bins (using mstransform or custom Doppler set routines Doppler setting #32 ).
The interpolation procedure has a negligible effect on the spectral response relative to the Hanning window itself.
Procedure
The procedure works similar to a simple RML loop, with BaseCube configured to have the same channel spacing as the data. Then each forward pass calculates BaseCube -> ImageCube -> FourierCube
After the FourierCube step, a convolution with a Hann window is applied in the frequency domain to the model
The model visibilities are selected relative to the gridded dataset, yet retain their channelization (edge case where visibilities left one cell/relative to the other)
The negative likelihood loss is calculated using a covariance matrix accounting for the Hann convolution across channels
Benefits
This is likely to be the fastest of any solution, because it does not require "degridding" the visibilities.
Concerns/doubts
Will there be significant edge cases where the visibility mask for GriddedDataset is different in each channel? This would matter for the covariance matrix step but not the convolution step. It's possible to just leave a blank entry in the covariance matrix.
Yet another additional concern is that spatial frequency (for fixed baseline location) is a function of observing channel. This should be small enough to be neglected, but it would be a good idea to calculate this and put bounds on it.
Are there many use cases where would want to account for the spectral covariance but not for the interpolation response/Doppler setting?
Is your feature request related to a problem or opportunity? Please describe.
For high sensitivity and/or spectral resolution applications, the spectral response function of the correlator system should be taken into account in the calculation of the negative log likelihood loss.
A proper understanding and treatment of this will be important for
Relevant background information
Spectral resolution is achieved by calculating "lags" in the autocorrelation of the signal. Via the Wiener–Khinchin theorem, the Fourier transform of the autocorrelation function is the power spectrum. For more background, see TMS Ch 8 and Bracewell, Ch 6 "The Autocorrelation Theorem".
There are (at least) two types of correlators, "FX" and "XF," and they correspond to the order in which the Fourier transform (F) and cross correlation (X) are applied to the incoming data stream. The correlator for the ALMA main array is technically an "FXF" correlator or "filtering-correlation-Fourier transform" correlator that includes a digital filter and takes the best of both worlds, boosting spectral resolution up to a factor of 32 over a traditional XF correlator. We'll trust the comments in TMS 8.8.3 that the upgraded VLA correlator as well as the main ALMA correlator are fundamentally XF or "lag" correlators, at least at the fine resolutions corresponding to spectral line observations. In an XF correlator, the cross-correlation is done first (to get the cros-correlation signal) and then the Fourier transform is performed (to get the complex-valued cross-power spectrum (i.e., the visibility as a function of frequency)). More notes on the benefits of an XF vs FX design are available in TMS 8.8.5. The correlator for the ACA is an FX design, which operates differently. However, in practice, the spectral resolution of the ACA correlator is much higher than the 12m data, so a "profile synthesis" step is performed to match the spectral response function of the ACA data to that of the 12m data.
Because the maximum lag is finite, there will always exist a window function to the autocorrelation, whose Fourier transform manifests as a spectral response function in the power spectrum. Richard Hills' notes on spectral response contain good illustrations of this effect. The Hanning window is the default choice for ALMA main array observations, though the user can specify different options in their Phase 2 submission.
ALMA also provides an option to bin the spectrum in an on-line fashion by 2, 4, 8, or 16 channels to reduce the data rate and final data volume. While real-time data rate limits were a concern in earlier cycles, they seem to be less of an issue in current (2021) cycles. Because a Hann window applied to a Nyquist sampled data stream results in a spectrum with an effective spectral resolution (defined by FWHM) of two channels, an on-line binning of 2 results in a final set of channels that are (minimally) correlated.
Spectral resolution
Much of the discussion in the ALMA documentation surrounding these choices focuses on whether the final samples are correlated. Correlation is only one concern related to forward modeling the visibility samples, however. Even if the samples are uncorrelated, we still need to implement the effect of the spectral response when bringing a latent space model forward to the data. I.e., if the channels are 244kHz wide after 2x averaging, that doesn't mean we can sample a latent space model at 244kHz and call it a day. If the latent space model contains intrinsic spectral features at 32kHz, for example, we would first need to convolve the latent space model with the effective spectral response (the result of sampling, Hann windowing, and averaging) before making a comparison with the data. This is probably easiest to achieve with a convolution in the spectral/velocity domain i.e., using the appropriately sized Hann window SRF, or the one formed by channel averaging (see example in Richard Hills' notes on spectral response). It wouldn't necessarily be 3-points long, because the 32kHz latent space model is oversampled relative to the Nyquist rate of 244kHz. This should be straightforward to implement with PyTorch Conv1D with the stride set to the output channel spacing.
Spectral covariance
See TMS v3 Section 8.8.2 and Loomis+18 Appendix C for more details, but the processing of the model cube is likely just a simple 3 point digital filter that we could implement as a convolution. The covariance matrix of the data should reflect that same convolutional processing.
Describe the solution you'd like
The spectral covariance loss might be implemented as inheriting from
torch.nn.Module
inmpol.losses
, much like loss functions in PyTorch. The covariance matrix (and any Cholesky factorization or other pre-caching) would be cached as part of the__init__
process, since the matrix itself is fixed.If the covariance is coupled with addressing the Doppler setting, we will also need to consider #4, because the act of "gridding" a dataset will destroy time information.
Describe alternatives you've considered
For lower S/N datasets (and continuum datasets), ignoring this effect and using the GriddedDataset is adequate. But for high S/N applications, it'd be nice to get as much information out of the data as possible.
The text was updated successfully, but these errors were encountered: