Adjoint gradients in Near2Far change with the number of processes #1555

mawc2019 · 2021-04-24T21:04:31Z

The adjoint gradients in the Near2Far class change with the number of mpi processes. The test was based on the example code at Near2Far Optimization with Epigraph Formulation, with the objective function replaced by

def J(alpha):
    return npa.sum(npa.abs(alpha[0,:,2])**2)

opt = mpa.OptimizationProblem(
    simulation = sim,
    objective_functions = J,
    objective_arguments = ob_list,
    design_regions = [design_region],
    frequencies=frequencies,
    decay_by = 1e-6,
    decay_fields=[mp.Ez],
    maximum_run_time = 2000
)

The adjoint gradients were calculated as

x0 = 0.5*np.ones(Nx*Ny)
opt.update_design([x0])
f0, dJ_deps = opt()
g_adjoint = np.sum(dJ_deps, axis=1)

I ran the code with 1, 16, and 32 mpi processes. Their relative differences are as follows. The horizontal axis labels different pixels in the design region, and the vertical axis represents the relative differences w.r.t. the 1-process adjoint gradients, namely, 16-process adjoint gradients / 1-processes adjoint gradients − 1 and 32-processes adjoint gradients / 1-processes adjoint gradients − 1.

If we take the absolute value and plot them in the log scale, we can see that the differences are ubiquitous rather than limited to those spikes.

I also tested EigenmodeCoefficient, but did not see the variation of adjoint gradients among different numbers of processes.

The text was updated successfully, but these errors were encountered:

smartalecH · 2021-04-25T18:33:19Z

See #1547, which should hopefully fix this.

mawc2019 · 2021-04-25T23:57:58Z

Thank you, Alec. Near2Far computes far field from near field on the Yee lattice and no voxel-center averaging is needed, but #1547 aims for dealing with the case of yee_grid=False which involves voxel-center averaging.

stevengj · 2021-04-29T18:21:56Z

Maybe add an assertion after this line to check that f->fc->gv.owns(ix0) is true. This ensures that #1547 is not needed.

stevengj · 2021-04-29T18:25:13Z

The other thing you could do is to add a printf statement after this line to print out the iloc (ix0) and the amplitude EH0 for all the adjoint source points.

The set of source locations and amplitudes should be the same regardless of the number of processes.

(You might want to first shrink the near-to-far region in order to reduce the number of source points you need to examine, while making sure you still reproduce the bug.)

mochen4 · 2021-04-29T18:47:43Z

Thanks Steven! I will follow your suggestions and do the checks

mochen4 · 2021-05-08T17:38:28Z

After some experiments, I find that the assertion f->fc->gv.owns(ix0) fails; there is no problem at all if I shrink the near-to-far region. And as you can expect, the size of the near-to-far region at which the assertion starts to fail depends on the number of processes.

stevengj · 2021-05-13T17:26:28Z

I find that the assertion f->fc->gv.owns(ix0) fails

That's not good — can you do some more digging to print out the coordinates of ix0 and see where it is in the near2far region?

mawc2019 · 2021-05-15T23:32:17Z

Thank you, @mochen4. Here I am trying to illustrate this bug using an example with only one process. In the example below, the finite-difference gradients and the adjoint gradients are inconsistent, which happens when the monitor is at some special positions, e.g., one pixel away from the lateral PML region. In normal situations, the monitor is not entirely located there. The purpose of this example is merely to demonstrate the bug. In addition, the inconsistency may arise at new positions when the number of processes increases, implying that the unusual case here might be related to the case mentioned in the beginning of this issue.

The code is as follows.

import meep as mp
import meep.adjoint as mpa
import numpy as np
from autograd import numpy as npa
from autograd import tensor_jacobian_product, grad, jacobian
import nlopt
from matplotlib import pyplot as plt
SiO2 = mp.Medium(index=1.45)
SiN = mp.Medium(index=2.02)

resolution,design_region_resolution = 40,10

Lx,Ly = 4,4
dpml = 1
design_region_width,design_region_height = 2,0.5
monitor_height,monitor_length = 1,0.5
monitor_pml_distance = 1/resolution # distance from the lateral PML
pml_layers = [mp.PML(dpml)]
cell_size = mp.Vector3(Lx+2*dpml,Ly+2*dpml)

fcen,fwidth = 2,0.2
source_center,source_size = mp.Vector3(0,-Ly/4),mp.Vector3(Lx+2*dpml,0)
src = mp.GaussianSource(frequency=fcen,fwidth=fwidth,is_integrated=True)
source = [mp.Source(src,component=mp.Hz,center=source_center,size=source_size)]
frequencies = [fcen-fwidth/2,fcen,fcen+fwidth/2]

Nx,Ny = int(design_region_resolution*design_region_width),int(design_region_resolution*design_region_height)
design_variables = mp.MaterialGrid(mp.Vector3(Nx,Ny),SiO2,SiN,grid_type='U_MEAN')
design_region = mpa.DesignRegion(design_variables,volume=mp.Volume(center=mp.Vector3(0,design_region_height/2),size=mp.Vector3(design_region_width,design_region_height)))

geometry = [mp.Block(center=design_region.center, size=design_region.size, material=design_variables)]
sim = mp.Simulation(cell_size=cell_size,boundary_layers=pml_layers,geometry=geometry,sources=source,default_material=SiO2,resolution=resolution)

far_point = [mp.Vector3(10,10,0)]
NearRegions = [mp.Near2FarRegion(center=mp.Vector3(monitor_pml_distance-Lx/2,monitor_height), size=mp.Vector3(0,monitor_length), weight=+1)]
FarFields = mpa.Near2FarFields(sim, NearRegions ,far_point)
ob_list = [FarFields]

def J(far_data):
    return npa.abs(far_data[0,1,5])**2

opt = mpa.OptimizationProblem(
    simulation = sim,
    objective_functions = J,
    objective_arguments = ob_list,
    design_regions = [design_region],
    frequencies=frequencies,
    decay_by = 1e-5,
    decay_fields=[mp.Ez]
)

x0 = 0.5*np.ones(Nx*Ny)
opt.update_design([x0])
f0, dJ_deps = opt()

db = 1e-4
choose = 10
np.random.seed(2021)

g_fd_idx, idx = opt.calculate_fd_gradient(num_gradients=choose,db=db)
g_fd_idx = np.array(g_fd_idx).flatten()
g_ad_full = np.sum(dJ_deps, axis=1).flatten()

g_fd = np.multiply(g_fd_idx,10**4)
g_ad = np.multiply(g_ad_full[idx],10**4)

min_g = np.min(g_fd)
max_g = np.max(g_fd)

(m, b) = np.polyfit(np.squeeze(g_fd), g_ad, 1)
plt.plot(g_fd,g_ad,"o")
plt.plot([min_g, max_g],[min_g, max_g],label='y=x for comparison')
plt.plot([min_g, max_g],[m*min_g+b, m*max_g+b],'--',label='Linear fit')
plt.xlabel('Finite Difference Gradient × $10^4$')
plt.ylabel('Adjoint Gradient × $10^4$')
plt.legend()
plt.grid(True)

The result is as follows, which shows no linear correlation.

The inconsistency is not restricted to the Ez component. For example, if the component in the source is replaced by mp.Hz and the function J is changed to npa.abs(far_data[0,1,5])**2 accordingly, the result is then as follows, which shows a little linear correlation with an incorrect slope.

The inconsistency is absent when monitor_pml_distance takes most other values, such as 0 or 2/resolution. But when the number of processes increases, the inconsistency may occur at new positions. For example, when the number of processes is 16 and monitor_pml_distance equals Lx/2, the inconsistency appears.

mochen4 · 2021-05-23T03:48:17Z

After some more digging, I found that in LOOP_OVER_IVECS, is and ie may or may not be owned by gv. The test case has near2far region in the x direction. gv.little_corner().x() is strictly less than (f->is).x() when (f->fc->gv.owns(f->is)), and gv.big_corner().x() is strictly larger than (f->ie).x() when (f->fc->gv.owns(f->ie)). However, sometimes is (or ie) isn't owned, and its value equals little_corner() (or big_corner()).

I only used one process in the experiment, and is was unowned much more often than ie.

stevengj · 2021-05-24T14:09:59Z

Can you print out the weight w for those points? Maybe it has zero weight?

stevengj · 2021-05-24T14:22:03Z

Also check that the is and ie give disjoint (non-intersecting) grid volumes when you look at all the chunks (for a given field component). (Technically, this is only true after applying the symmetry transformations, but in your test case you have no symmetries.)

stevengj · 2021-05-24T17:11:38Z

It looks like loop_in_chunks does not actually guarantee that the points are owned, although they are in the chunk and they should be disjoint between chunks (after symmetry mapping). A first draft at fixing this is in https://github.com/NanoComp/meep/tree/loop_in_chunks_fixes

smartalecH · 2022-01-05T17:08:55Z

A first draft at fixing this is in...

@stevengj do you have any updates on this fix? It might be worth it to add as a PR to see if checks are passing (and to check against other PRs).

stevengj · 2022-01-05T17:20:20Z

@mochen4, weren't you working on this at some point?

mochen4 · 2022-01-05T18:01:52Z

If I remember correctly, tests/symmetry.cpp and tests/near2far.cpp (which uses symmetry) were failing. When symmetry is used, the corners of the chunk might need to be shifted. However, I couldn't find a way that would pass all the tests. One particular issue I vaguely remember is that, in test_1d_periodic_mirror, the point at one end of the geometry got updated but it wasn't supposed to. Then you decided to look at the issue yourself? @stevengj

mawc2019 · 2022-11-23T14:21:03Z

A similar issue on EigenmodeCoefficient remains in the latest version of Meep, namely, v1.25.0. Some 3d tests are demonstrated here. The relative differences between different numbers of processes can be a few percent to 80 percent.

oskooi added the bug label Apr 25, 2021

stevengj mentioned this issue Jan 5, 2022

Non-materialized dft fields for adjoint calculations #1855

Merged

4 tasks

mochen4 mentioned this issue Jan 12, 2022

WIP: loop_in_chunks over owned points #1895

Merged

stevengj closed this as completed in #1895 Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjoint gradients in Near2Far change with the number of processes #1555

Adjoint gradients in Near2Far change with the number of processes #1555

mawc2019 commented Apr 24, 2021

smartalecH commented Apr 25, 2021

mawc2019 commented Apr 25, 2021

stevengj commented Apr 29, 2021

stevengj commented Apr 29, 2021

mochen4 commented Apr 29, 2021

mochen4 commented May 8, 2021

stevengj commented May 13, 2021

mawc2019 commented May 15, 2021

mochen4 commented May 23, 2021 •

edited

Loading

stevengj commented May 24, 2021

stevengj commented May 24, 2021 •

edited

Loading

stevengj commented May 24, 2021 •

edited

Loading

smartalecH commented Jan 5, 2022

stevengj commented Jan 5, 2022

mochen4 commented Jan 5, 2022

mawc2019 commented Nov 23, 2022 •

edited

Loading

Adjoint gradients in Near2Far change with the number of processes #1555

Adjoint gradients in Near2Far change with the number of processes #1555

Comments

mawc2019 commented Apr 24, 2021

smartalecH commented Apr 25, 2021

mawc2019 commented Apr 25, 2021

stevengj commented Apr 29, 2021

stevengj commented Apr 29, 2021

mochen4 commented Apr 29, 2021

mochen4 commented May 8, 2021

stevengj commented May 13, 2021

mawc2019 commented May 15, 2021

mochen4 commented May 23, 2021 • edited Loading

stevengj commented May 24, 2021

stevengj commented May 24, 2021 • edited Loading

stevengj commented May 24, 2021 • edited Loading

smartalecH commented Jan 5, 2022

stevengj commented Jan 5, 2022

mochen4 commented Jan 5, 2022

mawc2019 commented Nov 23, 2022 • edited Loading

mochen4 commented May 23, 2021 •

edited

Loading

stevengj commented May 24, 2021 •

edited

Loading

stevengj commented May 24, 2021 •

edited

Loading

mawc2019 commented Nov 23, 2022 •

edited

Loading