-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
evaluating the order of accuracy of the adjoint gradient #1854
Comments
Even though smoothing is "on", it's not going to work unless the input is smoothed and |
It looks like your design resolution (set to 640?) is much higher than even your highest simulation resolution (160). Is this intended? I thought the correct approach was to use a comparable or lower design resolution to ensure a slowly varying latent design over each Yee grid cell? One issue with this approach is that the bilinear interpolation from a high resolution design array to a relatively lower resolution Yee grid will lead to a very sparse gradient, where may of the gradient values are zero (by definition of the bilinear interpolation). The locations of the pixels which are non-zero will probably shift significantly with changes to the Yee grid resolution, which could partially explain the behavior you're seeing. It would be useful to plot the gradient field to check this. |
This is intentional because we want to ensure that, as part of the convergence study, the
The gradient for this geometry is indeed sparse (only the values at the dielectric interfaces are non-zero) which could therefore cause large changes in the directional derivative with "small" changes in the Meep grid resolution. Will post results on this topic shortly. @smartalecH I did try smoothing the ring geometry of the |
What really matters is that the first-order Taylor approximation used to define the interfaces of the new subpixel smoothing routine is satisfied. The intended way to do this is using a smoothing filter (in fact, @stevengj derived this approach with the assumption the input variables would be smoothed, as the original adjoint solver was built on a density-based framework).
All the more reason to first smooth using a filter. Backpropagating through this filter is just a transposed convolution, which will "smear" the gradient energy to all surrounding pixels so that the resulting gradient is no longer sparse. The filter will also "regularize" the problem (although that doesn't matter here).
This is actually a different kind of sparsity than what @ianwilliamson is referring to. This kind of sparsity is a result of levelset "shape optimization" methods (i.e. when
@oskooi can you post the code and the results? I don't see any filtering or projection in the above script. How are you computing the directional derivative? The above code is just computing the sum of the gradient. You're not scaling with resolution (i.e. doing a proper integral). In that case, we shouldn't expect the sum to converge to a finite value with increasing resolution (although the integral should converge...I think...). Edit: although if the design region resolution is consistent, the sum should also work. But the directional derivative should converge, assuming you're acting in the same direction with increasing resolution (which means your design grid resolution must remain the same throughout all runs). |
This is what I was trying to point out. By definition, using bilinear interpolation to downsample from high resolution to low resolution will always create "sparsity" in the gradient. This is the correct gradient for the operation being performed, but I'm not sure if it's what you want for the test you're creating.
I think even with a filter applied, a sufficiently high resolution design array will still have sparsity in the gradient. It should depend on the ratio of the resolutions and the size of the filter kernel. If your filter doesn't cover enough pixels in the high resolution design array, then some design pixels will just never influence the simulation and, as a result, will have zero gradient. Just want to point out that having a filter doesn't guarantee the intended behavior. |
Here are some results for the adjoint gradient map obtained using subpixel smoothing (script below) for a fixed Some comments:
The next thing to check based on the first comment is to redo the convergence study with a larger filter radius. The script used for subpixel smoothing is as follows: weights = np.logical_and(np.sqrt(np.square(xv) + np.square(yv)) > rad,
np.sqrt(np.square(xv) + np.square(yv)) < rad+wdt,
dtype=np.double)
filter_radius = 16/design_region_resolution
filtered_weights = mpa.conic_filter(weights,
filter_radius,
design_region_size.x,
design_region_size.y,
design_region_resolution)
ring_weights = filtered_weights.flatten()
def adjoint_solver(resolution):
matgrid = mp.MaterialGrid(mp.Vector3(Nx,Ny),
mp.air,
silicon,
weights=np.ones((Nx,Ny)),
do_averaging=True,
beta=np.inf)
f, dJ_du = opt([ring_weights]) |
Quick update: increasing the filter radius from 16 pixels to 32 and then 64 produces results for the directional derivative which appear to be converging to a fixed value. However, as shown below, the convergence rate appears to be linear (or slightly greater) but not quadratic. It's likely that the convergence rate is indeed quadratic and we just need to extend the data out to larger resolutions to see the trend more clearly... |
Are you backpropagating the gradient through the filter? How are you computing the directional derivative? |
Based on recent discussions, I modified the test above to remove the input and output waveguides (since the terminations contain sharp corners and thus introduce field discontinuities which can affect the results) so there is now just the ring structure. I extended the Meep grid resolution range to 20, 40, 80, 160, and 320 using a The figure below shows the results for three different filter radii. The convergence rate is (nearly) second order. Note that increasing the filter radius does not seem to significantly change the results. For completeness, we can try to continue extending the resolution range even further but these results seem sufficient. Next steps are (1) to demonstrate that reducing the filter radius to zero will eventually lead to a first-order convergence rate and (2) show results for no smoothing. |
Maybe a clearer way to present this would be to plot the |gradient - FD gradient| vs resolution, showing that gradient always matches the forward solver. Ideally, this error should be small regardless of the resolution. That way, if the forward solver is converging quadratically, the gradient necessarily is as well. |
Based on @stevengj's suggestion above, here are results comparing the adjoint gradient vs. the finite-difference gradient (via the directional derivative) as a function of the Meep grid resolution (25, 50, 100, and 200 pixels/μm) for a ring structure (fixed There are three different test cases. Two of the test cases for which
|
Why aren't you using the same convergence criteria between your finite-difference solves and your adjoint gradient solves? The adjoint solver, of course, checks the relative change in dft fields. Your FD solves use a fixed runtime, however:
I see that you have a Another thing to check is the Using this example, I can quickly get about 10% error (at Another hack would be to use Richardson extrapolation -- this would increase the computation time, however. |
The choice to use a fixed runtime of
The test setup that I am using applies a small, random perturbation to every pixel of the # random epsilon perturbation for design region
deps = 1e-5
dp = deps*rng.rand(Nx*Ny) I also tried changing |
We're talking about two different things. I'm referring to the step size used to compute the derivative of the operator (again, the
This is wrong. They need to converge at the same point, otherwise it's not an apples to apples comparison.
Correct. See my comment here.
For your FD calculations, make sure you use the same |
I tried reducing the finite-difference step size used to compute the adjoint gradient to 1e-6 and 1e-8 by manually editing: Lines 21 to 22 in 4f2b0a4
However, the results for the adjoint gradient for
As for the fixed runtime, the key point is that it is chosen to be sufficiently long to ensure that the DFT fields are converged for both the forward and adjoint simulations. There is no requirement that the runtime needs to be exactly the same to compare the finite-difference gradient from the forward simulations to the adjoint gradient from the adjoint simulation. |
Note, as I mentioned above, you can just use the
The results are disappointing, but I'm still not convinced this is a "bug" in the recombination step. Rather, I think the issue is a combination of choosing the right finite difference and ensuring the field convergence is consistent between FD and adjoint gradients.
You need to be careful here. Yes, as long as the fields converge for both the adjoint gradient calculation and the FD gradient calculation, the results should be consistent. But it's hard to ensure that -- especially when you have resonant structures. So the next best thing is just to make sure the same dft fields converge to the same threshold, which is why I suggested my fix above. |
Following @smartalecH's suggestion to use the same
These results are not surprising since the fixed runtime I was using previously already ensured that the DFT fields were converged. We therefore need to look elsewhere for the cause of the discrepancy in the gradients. |
Confirming that #1951 produces matching gradients (finite difference vs. adjoint solver) computed for The four columns of output are: resolution, gradient (finite difference), gradient (adjoint solver), relative error.
The results above are generated using double-precision floating point for the fields and materials arrays. I also confirmed that #1951 is working using single-precision floating point:
Note that the relative error for single precision is larger than double precision, as expected. |
@oskooi this test case is using |
During today's discussion, I was referring to the example in #1503 which uses an |
Let me suggest an experiment:
|
I'm pulling together the results of this experiment now. I have a few followup questions:
Is there a reason we don't want to plot the relative change in FOM? For example,
Would it be ok to instead smooth ρ, then interpolate, then project? This is precisely how meep does it under the hood... so it's more convenient. |
It's concerning that it doesn't seem to be converging. If this is a real effect (not just oscillatory convergence and you happen to be unlucky at ending on a peak), what is limiting the accuracy? Might be worth doing the TO with a material grid at 4x the Meep resolution, so that when you double the Meep resolution it is still 2x. (Since you are doing the Ez polarization, it's not surprising that no smoothing does fine. You should see more of a difference for the Hz polarization.) Also, make sure the lengthscale constraint is satisfied; sharp corners will definitely hamper convergence. |
Maybe try a smaller test problem, where it is easier to crank the resolution. For example, suppose you have PML in x and periodic in y, consisting of air + design + air. Launch a planewave with a line source, and maximize |Hz|^2 in the air on the other side of the design region (i.e. maximize a near-field focus). Make sure the period in y is not an integer number of wavelengths to avoid scattering into tangential waves. (e.g. maybe just period=0.9λ). As noted above, you may need a lengthscale constraint to prevent sharp corners from forming. Don't put the near-field focus too close to the design region, e.g. λ/2 away, to avoid favoring corner singularities. |
Note that the convergence might be more irregular if you are converging to a completely different structure at each resolution — might be better to use the previous resolution as a starting guess. Also, make sure you are imposing the lengthscale constraints at the boundary of the design region, to ensure that it doesn't converge towards a structure with a sharp corner. (i.e. pad the design region before you impose constraints). |
Note that you need to avoid sharp corners at the edges of the design region (e.g. by padding the edges when you do the filtering/lengthscale constraints). |
While we have already demonstrated second-order accuracy for subpixel smoothing of the
MaterialGrid
in a "forward" simulation (see results in #1503), following #1780 which enabled support for backpropagating the gradients from the Yee grid to theMaterialGrid
, we have yet to validate the second-order accuracy of the gradients computed using an adjoint simulation.ToDos
The following 2d test computes the directional derivative using the adjoint gradient over a design region consisting of a ring structure which connects an input and output port/waveguide. (This is a similar setup to the existing unit tests for the adjoint solver in which the design region contains random pixel values and the directional derivative is computed in a random direction.) The objective function is a single DFT field point in the output waveguide at a single frequency (the red dot in the schematic below). (We would also eventually like to adapt this test to the other two objective functions: mode coefficients and near-to-far fields.) The ring structure is a
MaterialGrid
at a fixed resolution of 640 pixels/μm while the Meep grid resolution is repeatedly doubled: 20, 40, 80, and 160 pixels/μm. The directional derivative should be converging to a fixed value with increasing resolution but this does not appear to be the case. The output from running the script below shows the resolution in the second column and directional derivative in the third column.Before we can validate the accuracy of the gradients, we need to ensure that the results are converging to a fixed value.
The text was updated successfully, but these errors were encountered: