Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

place fourier adjoint sources collectively for each frequency #1821

Merged
merged 8 commits into from
Feb 10, 2022

Conversation

mawc2019
Copy link
Contributor

@mawc2019 mawc2019 commented Nov 8, 2021

The main features of this PR are

  1. The Fourier adjoint sources at each frequency are placed collectively, while in the current master branch the adjoint sources are placed point by point.
  2. The Fourier adjoint solver has 'yee_grid' as a user specified argument, which can be True or False. The default option is False.

This PR passes all the built-in tests except test_adjoint_jax.py, test_binary_partition_utils.py, and test_chunk_balancer.py, as the current master branch does.

@smartalecH
Copy link
Collaborator

How does this PR relate to #1547? Does that PR need to work/be merged first? Are you locally testing this PR on top of that PR?

@mawc2019
Copy link
Contributor Author

mawc2019 commented Nov 9, 2021

How does this PR relate to #1547?

The PR in #1547 could improve the accuracy of this PR in some circumstances.

Does that PR need to work/be merged first?

No. This PR can work/be merged without the PR in #1547, but the PR in #1547 needs this PR to work/be merged at first.

Are you locally testing this PR on top of that PR?

This PR has been tested separately from that PR. I also combined the two PRs, but the combined PR does not work at present as described in #1547.

@smartalecH
Copy link
Collaborator

This PR passes all the built-in tests except test_adjoint_jax.py, test_binary_partition_utils.py, and test_chunk_balancer.py, as the current master branch does.

It looks like it's passing all tests online? Perhaps you need to rebase your local build to ensure they pass locally too.

@mawc2019
Copy link
Contributor Author

mawc2019 commented Nov 9, 2021

It looks like it's passing all tests online? Perhaps you need to rebase your local build to ensure they pass locally too.

I ran the built-in tests only locally.

Comment on lines 251 to 254
thickness_indicator = self.sim.fields.indicate_thick_dims(self.volume.swigobj)
min_max_corners = self.sim.fields.get_corners(self._monitor.swigobj, self.component) # ivec values of the corners
raw_size = (min_max_corners[3:6]-min_max_corners[0:3])/2+np.ones(3)
true_size = np.array([raw_size[i]*(thickness_indicator[i]==1)+(thickness_indicator[i]==0) for i in range(3)],dtype=int) # monitor size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be a good idea to refactor this piece into an external function call since we'll be using it in some of the other objective quantities too (eventually).

Something similar to ObjectiveQuantity._adj_src_scale(). Maybe ObjectiveQuantity._get_dft_src_shape()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a name like _dft_monitor_size sounds more pertinent. The function would be

    def _dft_monitor_size(self,min_max_corners):
        thickness_indicator = self.sim.fields.indicate_thick_dims(self.volume.swigobj)
        raw_size = (min_max_corners[3:6]-min_max_corners[0:3])/2+np.ones(3)
        true_size = np.array([raw_size[i]*(thickness_indicator[i]==1)+(thickness_indicator[i]==0) for i in range(3)],dtype=int)
        return true_size

Line 252, which computes min_max_corners, is not encapsulated here, because min_max_corners is needed by fourier_sourcedata later.

python/adjoint/objective.py Outdated Show resolved Hide resolved
python/adjoint/objective.py Outdated Show resolved Hide resolved
src/dft.cpp Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Nov 12, 2021

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 78.26087% with 5 lines in your changes missing coverage. Please review.

Project coverage is 73.09%. Comparing base (61fae68) to head (5a39da6).
Report is 290 commits behind head on master.

Files with missing lines Patch % Lines
python/adjoint/optimization_problem.py 0.00% 4 Missing ⚠️
python/adjoint/objective.py 94.73% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1821      +/-   ##
==========================================
- Coverage   73.14%   73.09%   -0.05%     
==========================================
  Files          17       17              
  Lines        4923     4918       -5     
==========================================
- Hits         3601     3595       -6     
- Misses       1322     1323       +1     
Files with missing lines Coverage Δ
python/adjoint/objective.py 92.07% <94.73%> (+0.26%) ⬆️
python/adjoint/optimization_problem.py 56.18% <0.00%> (-0.59%) ⬇️

@smartalecH
Copy link
Collaborator

The failing test is the same failing test described here.

For reference, here are the values comparing the adjoint gradients with the finite differences:

Directional derivative -- adjoint solver: [-4.94904819e-06 -3.02577147e-04 -2.97732191e-04], FD: [-4.75876167e-07 -2.98877141e-04 -3.12129430e-04]

The finite-difference gradients seem to be off (again).

@mawc2019
Copy link
Contributor Author

The finite-difference gradients seem to be off (again).

It seems that this issue has been resolved. Should I rebuild this PR on the current master branch and upload again?

@smartalecH
Copy link
Collaborator

It seems that this issue has been resolved. Should I rebuild this PR on the current master branch and upload again?

I retriggered the tests, but they are still failing.

It's possible that the fix implemented in #1780 wasn't sufficient to resolve the issue with the new DFT adjoint sources. In which case, try building with single-precision locally and play with the finite-difference tolerance to see if that really is the issue (as described above). Make sure you rebase onto master when building locally.

@mawc2019
Copy link
Contributor Author

mawc2019 commented Nov 30, 2021

I rebased this PR on the current master branch locally with --enable-single. The error comes from self.assertClose when it is invoked at test_damping of test_adjoint_solver.py. The problematic gradients are
Directional derivative -- adjoint solver: [1.13276692e-05 1.59813860e-05 1.32704957e-05], FD: [1.19334179e-05 1.85025667e-05 1.42896045e-05].
The tolerance at single precision is 0.06. If it is increased to 0.14, the test will not fail.

For comparision, I also compiled this PR without --enable-single. The adjoint and finite-difference gradients there are
Directional derivative -- adjoint solver: [1.13276534e-05 1.59813124e-05 1.32704613e-05], FD: [1.17366255e-05 1.59434964e-05 1.29082464e-05], which pass the test.

In both cases, Python 3.7.3 was used. The finite-difference gradients seem still off.

@smartalecH
Copy link
Collaborator

(@oskooi I retriggered the tests after merging #1835 -- unfortunately, the tolerance adjustments in that PR were not enough to make this PR pass).

@smartalecH
Copy link
Collaborator

smartalecH commented Dec 1, 2021

The finite-difference gradients seem still off.

The finite-difference gradients are a bit flaky, particularly when single-precision is enabled and the new subpixel smoothing routine is turned on.

One issue we discussed was that the new default for MaterialGrid subpixel smoothing routine is do_smoothing=True. This is a bit problematic if the underlying densities are not filtered (which appears to be the case with the failing test).

Maybe try filtering the "random" design field before running the finite-difference (and corresponding adjoint) simulations. In other words, change the line

S12_unperturbed = forward_simulation_damping(p, frequencies)

to something like

S12_unperturbed = forward_simulation_damping(mpa.conic_filter(p,<<put parameters here...>>), frequencies) 

and the same for the second part of the finite-difference calculation:

S12_perturbed = forward_simulation_damping(p+dp, frequencies)

Some other things we can try:

  • Do a center-difference calculation
  • Do some Richardson extrapolation
  • Don't do a random initial design, but an initial design that makes sense (e.g. a ring or something)

@mawc2019
Copy link
Contributor Author

mawc2019 commented Dec 4, 2021

The failing check at test_damping was not caused by this PR, because test_damping only involves the EigenmodeCoefficient adjoint solver. In fact, the previous master branch, which was released on this Tuesday morning, did not pass this test, either, when compiled with --enable-single.

Maybe try filtering the "random" design field before running the finite-difference (and corresponding adjoint) simulations.

If we filter the design field in test_damping as follows, the check will be successful in the previous master branch as well as in this PR based on that.

    def test_damping(self):
        print("*** TESTING CONDUCTIVITIES ***")

        for frequencies in [[1/1.58, fcen, 1/1.53]]:
            ## filter the design field
            filter_radius = 0.12
            p_filtered = mpa.conic_filter(
                np.reshape(p,(Nx,Ny)),filter_radius,design_region_size.x,design_region_size.y,design_region_resolution).flatten()
            p_plus_dp_filtered = mpa.conic_filter(
                np.reshape(p+dp,(Nx,Ny)),filter_radius,design_region_size.x,design_region_size.y,design_region_resolution).flatten()
            
            ## compute gradient using adjoint solver
            adjsol_obj, adjsol_grad = adjoint_solver_damping(p_filtered, frequencies)

            ## compute unperturbed S12
            S12_unperturbed = forward_simulation_damping(p_filtered, frequencies)

            ## compare objective results
            print("S12 -- adjoint solver: {}, traditional simulation: {}".format(adjsol_obj,S12_unperturbed))
            self.assertClose(adjsol_obj,S12_unperturbed,epsilon=1e-6)

            ## compute perturbed S12
            S12_perturbed = forward_simulation_damping(p_plus_dp_filtered, frequencies)

            ## compare gradients
            if adjsol_grad.ndim < 2:
                adjsol_grad = np.expand_dims(adjsol_grad,axis=1)
            adj_scale = (dp[None,:]@adjsol_grad).flatten()
            fd_grad = S12_perturbed-S12_unperturbed
            print("Directional derivative -- adjoint solver: {}, FD: {}".format(adj_scale,fd_grad))
            tol = 0.06 if mp.is_single_precision() else 0.03
            self.assertClose(adj_scale,fd_grad,epsilon=tol)

However, doing so leads to a failing check for double precision. Moreover, the current master branch also fails the checks in test_adjoint_solver.py, which may not be caused by the same reason.

@smartalecH
Copy link
Collaborator

In fact, the previous master branch, which was released on this Tuesday morning, did not pass this test, either, when compiled with --enable-single.

Hmm, I'm not sure I follow. The current master branch appears to be passing all tests (including the recent addition you mention). Are you saying that when you build master on your local machine (e.g. with --enable-single) that test fails for you?

Moreover, the current master branch also fails the checks in test_adjoint_solver.py, which may not be caused by the same reason.

Can you be more specific? Again, master seems to be passing (and is passing for me locally).

@mawc2019
Copy link
Contributor Author

mawc2019 commented Dec 7, 2021

Are you saying that when you build master on your local machine (e.g. with --enable-single) that test fails for you?

Yes, I built the master branch on a local machine with --enable-single. I used Python 3.7.3. The tests fail at test_adjoint_solver.py and test_get_epsilon_grid.py as I also mentioned in #1835. Such failing tests already existed in a previous master branch. In other words, they are not caused by the most recent commit.

which may not be caused by the same reason.

I need to correct this statement. The reason should be the same, because the message given by the failing check at test_adjoint_solver.py does not change in the different recent versions of the master branch.

The failing check at test_damping was not caused by this PR, because test_damping only involves the EigenmodeCoefficient adjoint solver.

This failing check at test_adjoint_solver.py was caused by the commit on November 22nd. Before that commit, the check did not fail. After that commit, the check fails. In #1830, the test failure of single precision was also mentioned.

@smartalecH
Copy link
Collaborator

Even after the force push it still seems to be failing. Maybe take a closer look at exactly what's failing, as it's most likely due to something new in this PR (and not an out of date conflicting commit)

(Note you can download the python logs from the CI to see exactly where things are failing)

@mawc2019
Copy link
Contributor Author

mawc2019 commented Dec 8, 2021

In the log I saw the failure happened at test_adjoint_solver.py during make distcheck after configure with single-precision floating point and swig threads. Can I get the details of that failure at test_adjoint_solver.py? There should be test_adjoint_solver.log online.

@smartalecH
Copy link
Collaborator

smartalecH commented Dec 8, 2021

Yes, the details are here.

Note that the DFT tests are failing:

<<line 98>> Directional derivative -- adjoint solver: [-131.11312903], FD: [-0.00342687]

Also, the fields are blowing up during the complex EigenmodeCoefficient tests... Does your code somehow impact those at all?

@mawc2019
Copy link
Contributor Author

mawc2019 commented Dec 8, 2021

Note that the DFT tests are failing

The version of Python seems to matter a lot. I used Python 3.7.3 locally, and there were no failing tests of the FourierFields adjoint solver.

Also, the fields are blowing up during the complex EigenmodeCoefficient tests...

It seems that the diverging field in the tests of complex fields is also caused by the FourierFields adjoint solver, and adjoint_solver_complex_fields only invokes the FourierFields adjoint solver.

Does your code somehow impact those at all?

I do not think this PR impacts EigenmodeCoefficient.

@smartalecH
Copy link
Collaborator

It seems that the diverging field in the tests of complex fields is also caused by the FourierFields adjoint solver, and adjoint_solver_complex_fields only invokes the FourierFields adjoint solver.

You're right, I misread the log. The eigenmode tests seem to be unaffected by this (as expected)

The version of Python seems to matter a lot

Are you sure it's the Python version and not the single-precision that's causing the issue? It seems odd that the python version would affect anything in this PR... It's especially odd that the adjoint run for the complex case blows up...

@mawc2019
Copy link
Contributor Author

mawc2019 commented Dec 8, 2021

Are you sure it's the Python version and not the single-precision that's causing the issue?

I am not sure. I guess that the Python version matters, only because those failures of FourierFields do not appear locally when I compile Meep with or without --enable-single. The failure that I have seen locally is caused by test_damping that invokes EigenmodeCoefficient only. If the Python version does not matter, perhaps there are some differences between local and online tests?

It's especially odd that the adjoint run for the complex case blows up...

Diverging results also appeared previously, as mentioned in #1676. In an example there, the problem also appears without using the adjoint solver. I do not know whether these problems are caused by the same reason.

@smartalecH
Copy link
Collaborator

I am not sure. I guess that the Python version matters, only because those failures of FourierFields do not appear locally when I compile Meep with or without --enable-single. The failure that I have seen locally is caused by test_damping that invokes EigenmodeCoefficient only. If the Python version does not matter, perhaps there are some differences between local and online tests?

Hmm, I'm guessing your local branch doesn't line up with this PR+master.

@mawc2019
Copy link
Contributor Author

mawc2019 commented Dec 9, 2021

I'm guessing your local branch doesn't line up with this PR+master.

I just tested this PR locally again based on the current master branch. The check now fails at the same place as the most recent online check. By the way, I also checked the current master branch locally with single precision again, and saw that it still fails test_damping.

@mawc2019
Copy link
Contributor Author

In addition to the diverging fields and the inconsistency between the gradients, the adjoint gradients may be different in different runs. These problems happen only when this PR is compiled with --enable-single, but the Near2Far adjoint solver does not have these problems even if compiled with --enable-single.

The precision issue has already been handled in get_gradient, but do any places in this PR still need to handle the precision issue? In addition, this PR does not introduce any randomness explicitly. Where could the randomness in adjoint gradients come from?

@smartalecH
Copy link
Collaborator

Can you provide a MWE that illustrates the problem? Something simpler than the entire test suite.

I can then pull your branch and look into it locally.

@mawc2019
Copy link
Contributor Author

An example is as follows.

import meep as mp
import meep.adjoint as mpa
import numpy as np
from autograd import numpy as npa
Si, SiO2 = mp.Medium(index=3.4), mp.Medium(index=1.44)
print("Single precision? ",mp.is_single_precision())

resolution,design_resolution = 20,20
Lx,Ly,pml_size = 2,4,1
cell_size = mp.Vector3(Lx,Ly+2*pml_size)
pml_layers = [mp.PML(pml_size, direction=mp.Y)]
design_width,design_height = 1,0.6
monitor_width,monitor_height = 1,design_height+0.1

fcen,fwidth = 1/1.55,0.01/1.55
source_center,source_size = mp.Vector3(0,-Ly/4),mp.Vector3(Lx,0)

src = mp.GaussianSource(frequency=fcen,fwidth=fwidth)
source = [mp.Source(src,component=mp.Ez,center=source_center,size=source_size)]
Nx,Ny = int(round(design_resolution*design_width)),1

design_variables = mp.MaterialGrid(mp.Vector3(Nx,Ny),SiO2,Si,grid_type='U_MEAN')
design_region = mpa.DesignRegion(design_variables,volume=mp.Volume(center=mp.Vector3(), size=mp.Vector3(design_width, design_height)))

geometry = [mp.Block(center=design_region.center, size=design_region.size, material=design_variables)] # design region
sim = mp.Simulation(
    cell_size=cell_size,boundary_layers=pml_layers,geometry=geometry,
    sources=source,default_material=SiO2,resolution=resolution)

nf = mpa.FourierFields(sim,mp.Volume(center=mp.Vector3(0,monitor_height),size=mp.Vector3(monitor_width,0)),mp.Ez,yee_grid=True)
ob_list = [nf]

def J(near_data):
    return npa.abs(near_data[0,0])**2

opt = mpa.OptimizationProblem(
    simulation = sim,objective_functions = J,objective_arguments = ob_list,design_regions = [design_region],
    fcen=fcen,df = 0,nf = 1,decay_by = 1e-5)

x0 = 0.5*np.ones(Nx*Ny)
opt.update_design([x0])
f0, dJ_deps = opt()

db, choose = 1e-4, 5
np.random.seed(20211204)

g_discrete, idx = opt.calculate_fd_gradient(num_gradients=choose,db=db)
g_discrete = np.array(g_discrete).flatten()
g_adjoint = dJ_deps.flatten()

print("Randomly selected indices: ",idx)
print("Finite-difference gradient: ",g_discrete)
print("Adjoint gradient: ",g_adjoint[idx])

With double precision, the results are

Finite-difference gradient:  [206.65506121 -82.40772321  98.45018865   5.91237534  85.70581779]
Adjoint gradient:  [206.65504516 -82.38054598  98.43878076   5.91337638  85.71809238]

With single precision, the adjoint gradients vary among different runs and do not match the finite-difference gradients.

@stevengj
Copy link
Collaborator

the adjoint run hangs at MPI_Allreduce in max_to_all, which is invoked in one of the new functions get_dims

This will happen if not all of the processes are calling max_to_all. Maybe add a printf("hello from process %\n", my_rank()); to see which processes are calling fourier_sourcedata and hence get_dims?

src/dft.cpp Outdated Show resolved Hide resolved
frequencies. Because of linearity, we can sum across the
second frequency dimension to calculate a frequency
scale factor for each point (rather than a scale vector).
'''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this comment, it's helpful!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smartalecH, could you explain what is going on here? Why can you sum over frequency if the objective function is vector-valued?

src/array_slice.cpp Outdated Show resolved Hide resolved
src/array_slice.cpp Outdated Show resolved Hide resolved
src/array_slice.cpp Outdated Show resolved Hide resolved
src/array_slice.cpp Outdated Show resolved Hide resolved
src/dft.cpp Outdated Show resolved Hide resolved
volume *where = &v; // use full volume of fields
bufsz = 0;
min_corner = gv.round_vec(where->get_max_corner()) + one_ivec(gv.dim);
max_corner = gv.round_vec(where->get_min_corner()) - one_ivec(gv.dim);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use v.get_max_corner() etcetera and eliminate where completely?

src/dft.cpp Outdated Show resolved Hide resolved
src/dft.cpp Outdated Show resolved Hide resolved
src/dft.cpp Show resolved Hide resolved
src/dft.cpp Outdated Show resolved Hide resolved
src/dft.cpp Outdated Show resolved Hide resolved
src/dft.cpp Outdated Show resolved Hide resolved
src/dft.cpp Outdated Show resolved Hide resolved
src/dft.cpp Show resolved Hide resolved
@stevengj stevengj merged commit f359a04 into NanoComp:master Feb 10, 2022
@oskooi
Copy link
Collaborator

oskooi commented Feb 11, 2022

Looks like this PR, particularly the contents of dft_fields::fourier_sourcedata, is causing failures in the address sanitizer related to the misuse of size_t (unsigned) and ptrdiff_t (signed).

@stevengj stevengj mentioned this pull request Feb 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants