-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Enhancement: Second-Order Force Constants #149
Comments
tdep test profileSat Oct 19 04:52:38 2024 /content/kaldo/tdep_profile
158065905 function calls (157621338 primitive calls) in 542.526 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
204221 269.088 0.001 269.088 0.001 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_FastPathExecute}
1 101.927 101.927 196.021 196.021 secondorder.py:84(remap_force_constants)
15625262 51.681 0.000 88.051 0.000 linalg.py:2383(norm)
15628895 19.828 0.000 19.828 0.000 {method 'dot' of 'numpy.ndarray' objects}
22317 10.552 0.000 10.552 0.000 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_Execute}
1 7.235 7.235 320.034 320.034 anharmonic.py:115(project_crystal)
15625033 5.111 0.000 5.111 0.000 {method 'ravel' of 'numpy.ndarray' objects}
15919052 4.874 0.000 6.794 0.000 linalg.py:140(isComplexType)
115360 4.816 0.000 5.235 0.000 constant_op.py:75(convert_to_eager_tensor)
1296 4.722 0.004 310.844 0.240 anharmonic.py:183(project_crystal_mu)
139628/139612 4.665 0.000 4.667 0.000 {built-in method numpy.array}
31782543 4.439 0.000 4.503 0.000 {built-in method builtins.issubclass}
1 4.114 4.114 9.910 9.910 thirdorder.py:24(load)
15625262 3.368 0.000 3.368 0.000 linalg.py:2379(_norm_dispatcher)
15957351 2.883 0.000 3.033 0.000 {built-in method numpy.asarray}
310488/187234 1.904 0.000 278.454 0.001 dispatch.py:1246(op_dispatch_handler)
98011 1.298 0.000 2.640 0.000 linalg.py:329(solve)
1348227/1348083 1.254 0.000 1.263 0.000 {built-in method builtins.getattr}
187741 1.033 0.000 10.857 0.000 tensor_conversion_registry.py:164(convert)
379012/192406 0.891 0.000 283.646 0.001 traceback_utils.py:138(error_handler)
18219 0.786 0.000 8.826 0.000 tensor_getitem_override.py:65(_slice_helper)
... where the bottleneck is principally in tensorflow: ncalls tottime percall cumtime percall filename:lineno(function)
204221 269.088 0.001 269.088 0.001 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_FastPathExecute}
1 101.927 101.927 196.021 196.021 secondorder.py:84(remap_force_constants) requiring about half (49.6%) of the compute time; second order force constant remapping is next most time-expensive, requiring almost a fifth (18.6%) of the compute time. sigma2 test profileSat Oct 19 05:08:57 2024 /content/kaldo/sigma2_profile
99187443 function calls (99084002 primitive calls) in 147.319 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 64.398 64.398 123.341 123.341 secondorder.py:84(remap_force_constants)
10078010 32.839 0.000 55.085 0.000 linalg.py:2383(norm)
10077697 11.395 0.000 11.395 0.000 {method 'dot' of 'numpy.ndarray' objects}
10077729 3.447 0.000 3.447 0.000 {method 'ravel' of 'numpy.ndarray' objects}
10855440 3.310 0.000 4.686 0.000 linalg.py:140(isComplexType)
1 3.238 3.238 138.141 138.141 secondorder.py:20(parse_tdep_forceconstant)
21459145 3.155 0.000 3.156 0.000 {built-in method builtins.issubclass}
11069929 2.622 0.000 2.915 0.000 {built-in method numpy.asarray}
259248 2.479 0.000 5.410 0.000 linalg.py:329(solve)
10078010 2.010 0.000 2.010 0.000 linalg.py:2379(_norm_dispatcher)
426066 1.451 0.000 1.451 0.000 {method 'reduce' of 'numpy.ufunc' objects}
212601 0.966 0.000 3.161 0.000 cell.py:150(complete_cell)
... where the bottleneck is only in force constant remapping, requiring almost half (43.5%) the compute time ncalls tottime percall cumtime percall filename:lineno(function)
1 64.398 64.398 123.341 123.341 secondorder.py:84(remap_force_constants) remap_force_constants() profile (with sigma2 test) 82076477 function calls (82076226 primitive calls) in 59.239 seconds
Ordered by: cumulative time
List reduced from 149 to 100 due to restriction <100>
ncalls tottime percall cumtime percall filename:lineno(function)
10077960 33.450 0.000 55.362 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:2383(norm)
10077696 11.408 0.000 11.408 0.000 {method 'dot' of 'numpy.ndarray' objects}
10217727 3.093 0.000 4.332 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:140(isComplexType)
10077696 3.264 0.000 3.264 0.000 {method 'ravel' of 'numpy.ndarray' objects}
20389043 2.768 0.000 2.768 0.000 {built-in method builtins.issubclass}
10077960 1.992 0.000 1.992 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:2379(_norm_dispatcher)
10218110 1.747 0.000 1.827 0.000 {built-in method numpy.asarray}
46677 0.726 0.000 1.403 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:329(solve)
46688 0.055 0.000 0.450 0.000 /usr/local/lib/python3.10/dist-packages/ase/utils/arraywrapper.py:66(attr)
46677 0.169 0.000 0.308 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:159(_commonType)
93354 0.096 0.000 0.132 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:135(_makearray)
93354 0.058 0.000 0.082 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:153(_realType)
46779 0.045 0.000 0.080 0.000 /usr/local/lib/python3.10/dist-packages/ase/cell.py:230(__array__)
46677 0.059 0.000 0.059 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:209(_assert_stacked_square)
140091 0.058 0.000 0.058 0.000 {built-in method builtins.getattr}
46677 0.053 0.000 0.053 0.000 {method 'astype' of 'numpy.ndarray' objects}
46677 0.043 0.000 0.043 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:130(get_linalg_error_extobj)
46677 0.042 0.000 0.042 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:203(_assert_stacked_2d)
46837/46813 0.036 0.000 0.036 0.000 {built-in method numpy.array}
93382 0.024 0.000 0.024 0.000 {method 'get' of 'dict' objects}
46677 0.019 0.000 0.019 0.000 {method '__array_prepare__' of 'numpy.ndarray' objects}
46677 0.016 0.000 0.016 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:325(_solve_dispatcher)
8 0.000 0.000 0.011 0.001 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:366(get_distances)
8 0.000 0.000 0.011 0.001 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:234(conditional_find_mic)
8 0.000 0.000 0.011 0.001 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:242(<listcomp>)
8 0.000 0.000 0.011 0.001 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:203(find_mic)
1 0.003 0.003 0.010 0.010 /content/kaldo/kaldo/observables/secondorder.py:190(_map2prim)
8 0.002 0.000 0.008 0.001 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:170(general_find_mic)
368 0.003 0.000 0.003 0.000 {method 'reduce' of 'numpy.ufunc' objects}
12 0.001 0.000 0.003 0.000 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:36(wrap_positions)
226 0.000 0.000 0.002 0.000 /usr/local/lib/python3.10/dist-packages/ase/atoms.py:1096(__iter__)
224 0.001 0.000 0.002 0.000 /usr/local/lib/python3.10/dist-packages/ase/atoms.py:1100(__getitem__)
8 0.000 0.000 0.002 0.000 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:156(naive_find_mic) where it looks like the principal bottleneck may be with the number and/or placement of numpy's norm function calls. |
remap_force_constants() profile (with tdep test) 126945141 function calls (126944874 primitive calls) in 91.140 seconds
Ordered by: cumulative time
List reduced from 149 to 100 due to restriction <100>
ncalls tottime percall cumtime percall filename:lineno(function)
15625262 49.892 0.000 84.951 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:2383(norm)
15625000 19.115 0.000 19.115 0.000 {method 'dot' of 'numpy.ndarray' objects}
15812527 4.735 0.000 6.619 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:140(isComplexType)
15625000 5.026 0.000 5.026 0.000 {method 'ravel' of 'numpy.ndarray' objects}
31562809 4.132 0.000 4.132 0.000 {built-in method builtins.issubclass}
15625262 3.347 0.000 3.347 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:2379(_norm_dispatcher)
15812842 2.621 0.000 2.747 0.000 {built-in method numpy.asarray}
62509 1.146 0.000 2.102 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:329(solve)
62514 0.079 0.000 0.707 0.000 /usr/local/lib/python3.10/dist-packages/ase/utils/arraywrapper.py:66(attr)
62509 0.235 0.000 0.418 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:159(_commonType)
125018 0.129 0.000 0.177 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:135(_makearray)
62551 0.071 0.000 0.126 0.000 /usr/local/lib/python3.10/dist-packages/ase/cell.py:230(__array__)
125018 0.075 0.000 0.107 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:153(_realType)
62509 0.090 0.000 0.090 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:209(_assert_stacked_square)
62509 0.088 0.000 0.088 0.000 {method 'astype' of 'numpy.ndarray' objects}
187551 0.081 0.000 0.081 0.000 {built-in method builtins.getattr}
62509 0.067 0.000 0.067 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:130(get_linalg_error_extobj)
62509 0.063 0.000 0.063 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:203(_assert_stacked_2d)
62585/62573 0.056 0.000 0.056 0.000 {built-in method numpy.array}
125046 0.032 0.000 0.032 0.000 {method 'get' of 'dict' objects}
62509 0.027 0.000 0.027 0.000 {method '__array_prepare__' of 'numpy.ndarray' objects}
62509 0.022 0.000 0.022 0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:325(_solve_dispatcher)
1 0.003 0.003 0.011 0.011 /content/kaldo/kaldo/observables/secondorder.py:190(_map2prim)
2 0.000 0.000 0.004 0.002 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:366(get_distances)
2 0.000 0.000 0.003 0.002 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:234(conditional_find_mic)
2 0.000 0.000 0.003 0.002 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:242(<listcomp>)
2 0.000 0.000 0.003 0.002 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:203(find_mic)
2 0.001 0.000 0.002 0.001 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:170(general_find_mic)
254 0.000 0.000 0.002 0.000 /usr/local/lib/python3.10/dist-packages/ase/atoms.py:1096(__iter__)
252 0.001 0.000 0.002 0.000 /usr/local/lib/python3.10/dist-packages/ase/atoms.py:1100(__getitem__)
300 0.002 0.000 0.002 0.000 {method 'reduce' of 'numpy.ufunc' objects}
6 0.001 0.000 0.001 0.000 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:36(wrap_positions)
4 0.000 0.000 0.001 0.000 /usr/local/lib/python3.10/dist-packages/ase/atoms.py:1913(wrap)
... |
Development PlanOn preliminary inspection, these loops may be calling the norm function more than they would be in a vectorized implementation: fc_out = np.zeros((n_sc_new, n_sc_new, 3, 3))
# outer loop: indexes ith atom in map2prim from new supercell positions
for a1, r0 in enumerate(new_supercell.positions):
uc_index = map2prim[a1]
# first inner loop: setup r_pair from uc_index-th sc_r element
for sc_a2, sc_r2 in enumerate(sc_r[uc_index]):
r_pair = r0 + sc_r2
r_pair = np.linalg.solve(sc_temp.T, r_pair.T).T % 1.0
# second inner loop: setup r_diff for to evaluate norm condition
for a2 in range(n_sc_new):
r_diff = np.abs(r_pair - ref_struct_pos[a2])
r_diff -= np.floor(r_diff + eps)
# FIXME the objective is to vectorize any/all of these loops to
# reduce the number of individual np.linalg.norm() calls,
# which can be huge in three nested loops; ideally, a single
# vectorized norm call would allow us to mask fc_out and
# force_constants for the conditional sum
if np.linalg.norm(r_diff) < tol:
fc_out[a1, a2, :, :] += force_constants[uc_index, sc_a2, :, :] I intend to reimplement to something more like: fc_out = np.zeros((n_sc_new, n_sc_new, 3, 3))
for a1, r0 in enumerate(new_supercell.positions):
uc_index = map2prim[a1]
# vectorized first loop
r_pair = sc_r[uc_index,:] + r0
solved_r_pair = np.linalg.solve(sc_temp.T, r_pair.T).T % 1.0
# vectorized second loop
r_diff = np.abs(solved_r_pair - ref_struct_pos)
r_diff -= np.floor(r_diff + eps)
# conditional mask
r_diff_norms = np.linalg.norm(r_diff, axis=1)
below_tolerance = np.where(r_diff_norms < tol)
# FIXME by vectorizing, we lose the explicit indices; however, this should
# be encoded implicitly in r_diff (or we can keep a master index array):
#
# {a2, sc_a2} |-> filtered_sc_r[:,0], filtered_sc_r[:,1] ?
# update force constant matrix
# >>> fc_out[a1, a2, :, :] += force_constants[uc_index, sc_a2, :, :]
# [x] a1
# [ ] a2
# [x] uc_index
# [ ] sc_a2
filtered_sc_r = sc_r[uc_index][below_tolerance]
fc_out[a1, filtered_sc_r[:,0], :, :] += force_constants[uc_index, filtered_sc_r[:,1], :, :] which, in principle, may not necessarily yield significant dividends unless the data is uniformly columnar (I am currently learning about numerical methods in one of my courses this quarter, so I'm not yet sure if this is always a good assumption), but it will reduce the number of norm calls, which may allow numpy the opportunity to norm more efficiently. |
No description provided.
The text was updated successfully, but these errors were encountered: