Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Enhancement: Second-Order Force Constants #149

Open
lattice737 opened this issue Oct 19, 2024 · 3 comments
Open

Performance Enhancement: Second-Order Force Constants #149

lattice737 opened this issue Oct 19, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@lattice737
Copy link

No description provided.

@lattice737
Copy link
Author

tdep test profile

Sat Oct 19 04:52:38 2024    /content/kaldo/tdep_profile

         158065905 function calls (157621338 primitive calls) in 542.526 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   204221  269.088    0.001  269.088    0.001 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_FastPathExecute}
        1  101.927  101.927  196.021  196.021 secondorder.py:84(remap_force_constants)
 15625262   51.681    0.000   88.051    0.000 linalg.py:2383(norm)
 15628895   19.828    0.000   19.828    0.000 {method 'dot' of 'numpy.ndarray' objects}
    22317   10.552    0.000   10.552    0.000 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_Execute}
        1    7.235    7.235  320.034  320.034 anharmonic.py:115(project_crystal)
 15625033    5.111    0.000    5.111    0.000 {method 'ravel' of 'numpy.ndarray' objects}
 15919052    4.874    0.000    6.794    0.000 linalg.py:140(isComplexType)
   115360    4.816    0.000    5.235    0.000 constant_op.py:75(convert_to_eager_tensor)
     1296    4.722    0.004  310.844    0.240 anharmonic.py:183(project_crystal_mu)
139628/139612    4.665    0.000    4.667    0.000 {built-in method numpy.array}
 31782543    4.439    0.000    4.503    0.000 {built-in method builtins.issubclass}
        1    4.114    4.114    9.910    9.910 thirdorder.py:24(load)
 15625262    3.368    0.000    3.368    0.000 linalg.py:2379(_norm_dispatcher)
 15957351    2.883    0.000    3.033    0.000 {built-in method numpy.asarray}
310488/187234    1.904    0.000  278.454    0.001 dispatch.py:1246(op_dispatch_handler)
    98011    1.298    0.000    2.640    0.000 linalg.py:329(solve)
1348227/1348083    1.254    0.000    1.263    0.000 {built-in method builtins.getattr}
   187741    1.033    0.000   10.857    0.000 tensor_conversion_registry.py:164(convert)
379012/192406    0.891    0.000  283.646    0.001 traceback_utils.py:138(error_handler)
    18219    0.786    0.000    8.826    0.000 tensor_getitem_override.py:65(_slice_helper)
      ...

where the bottleneck is principally in tensorflow:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   204221  269.088    0.001  269.088    0.001 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_FastPathExecute}
        1  101.927  101.927  196.021  196.021 secondorder.py:84(remap_force_constants)

requiring about half (49.6%) of the compute time; second order force constant remapping is next most time-expensive, requiring almost a fifth (18.6%) of the compute time.

sigma2 test profile

Sat Oct 19 05:08:57 2024    /content/kaldo/sigma2_profile

         99187443 function calls (99084002 primitive calls) in 147.319 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   64.398   64.398  123.341  123.341 secondorder.py:84(remap_force_constants)
 10078010   32.839    0.000   55.085    0.000 linalg.py:2383(norm)
 10077697   11.395    0.000   11.395    0.000 {method 'dot' of 'numpy.ndarray' objects}
 10077729    3.447    0.000    3.447    0.000 {method 'ravel' of 'numpy.ndarray' objects}
 10855440    3.310    0.000    4.686    0.000 linalg.py:140(isComplexType)
        1    3.238    3.238  138.141  138.141 secondorder.py:20(parse_tdep_forceconstant)
 21459145    3.155    0.000    3.156    0.000 {built-in method builtins.issubclass}
 11069929    2.622    0.000    2.915    0.000 {built-in method numpy.asarray}
   259248    2.479    0.000    5.410    0.000 linalg.py:329(solve)
 10078010    2.010    0.000    2.010    0.000 linalg.py:2379(_norm_dispatcher)
   426066    1.451    0.000    1.451    0.000 {method 'reduce' of 'numpy.ufunc' objects}
   212601    0.966    0.000    3.161    0.000 cell.py:150(complete_cell)
      ...

where the bottleneck is only in force constant remapping, requiring almost half (43.5%) the compute time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   64.398   64.398  123.341  123.341 secondorder.py:84(remap_force_constants)

remap_force_constants() profile (with sigma2 test)

 82076477 function calls (82076226 primitive calls) in 59.239 seconds

   Ordered by: cumulative time
   List reduced from 149 to 100 due to restriction <100>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 10077960   33.450    0.000   55.362    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:2383(norm)
 10077696   11.408    0.000   11.408    0.000 {method 'dot' of 'numpy.ndarray' objects}
 10217727    3.093    0.000    4.332    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:140(isComplexType)
 10077696    3.264    0.000    3.264    0.000 {method 'ravel' of 'numpy.ndarray' objects}
 20389043    2.768    0.000    2.768    0.000 {built-in method builtins.issubclass}
 10077960    1.992    0.000    1.992    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:2379(_norm_dispatcher)
 10218110    1.747    0.000    1.827    0.000 {built-in method numpy.asarray}
    46677    0.726    0.000    1.403    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:329(solve)
    46688    0.055    0.000    0.450    0.000 /usr/local/lib/python3.10/dist-packages/ase/utils/arraywrapper.py:66(attr)
    46677    0.169    0.000    0.308    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:159(_commonType)
    93354    0.096    0.000    0.132    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:135(_makearray)
    93354    0.058    0.000    0.082    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:153(_realType)
    46779    0.045    0.000    0.080    0.000 /usr/local/lib/python3.10/dist-packages/ase/cell.py:230(__array__)
    46677    0.059    0.000    0.059    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:209(_assert_stacked_square)
   140091    0.058    0.000    0.058    0.000 {built-in method builtins.getattr}
    46677    0.053    0.000    0.053    0.000 {method 'astype' of 'numpy.ndarray' objects}
    46677    0.043    0.000    0.043    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:130(get_linalg_error_extobj)
    46677    0.042    0.000    0.042    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:203(_assert_stacked_2d)
46837/46813    0.036    0.000    0.036    0.000 {built-in method numpy.array}
    93382    0.024    0.000    0.024    0.000 {method 'get' of 'dict' objects}
    46677    0.019    0.000    0.019    0.000 {method '__array_prepare__' of 'numpy.ndarray' objects}
    46677    0.016    0.000    0.016    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:325(_solve_dispatcher)
        8    0.000    0.000    0.011    0.001 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:366(get_distances)
        8    0.000    0.000    0.011    0.001 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:234(conditional_find_mic)
        8    0.000    0.000    0.011    0.001 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:242(<listcomp>)
        8    0.000    0.000    0.011    0.001 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:203(find_mic)
        1    0.003    0.003    0.010    0.010 /content/kaldo/kaldo/observables/secondorder.py:190(_map2prim)
        8    0.002    0.000    0.008    0.001 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:170(general_find_mic)
      368    0.003    0.000    0.003    0.000 {method 'reduce' of 'numpy.ufunc' objects}
       12    0.001    0.000    0.003    0.000 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:36(wrap_positions)
      226    0.000    0.000    0.002    0.000 /usr/local/lib/python3.10/dist-packages/ase/atoms.py:1096(__iter__)
      224    0.001    0.000    0.002    0.000 /usr/local/lib/python3.10/dist-packages/ase/atoms.py:1100(__getitem__)
        8    0.000    0.000    0.002    0.000 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:156(naive_find_mic)

where it looks like the principal bottleneck may be with the number and/or placement of numpy's norm function calls.

@lattice737 lattice737 self-assigned this Oct 19, 2024
@lattice737 lattice737 added the enhancement New feature or request label Oct 19, 2024
@lattice737
Copy link
Author

remap_force_constants() profile (with tdep test)

   126945141 function calls (126944874 primitive calls) in 91.140 seconds

   Ordered by: cumulative time
   List reduced from 149 to 100 due to restriction <100>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 15625262   49.892    0.000   84.951    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:2383(norm)
 15625000   19.115    0.000   19.115    0.000 {method 'dot' of 'numpy.ndarray' objects}
 15812527    4.735    0.000    6.619    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:140(isComplexType)
 15625000    5.026    0.000    5.026    0.000 {method 'ravel' of 'numpy.ndarray' objects}
 31562809    4.132    0.000    4.132    0.000 {built-in method builtins.issubclass}
 15625262    3.347    0.000    3.347    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:2379(_norm_dispatcher)
 15812842    2.621    0.000    2.747    0.000 {built-in method numpy.asarray}
    62509    1.146    0.000    2.102    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:329(solve)
    62514    0.079    0.000    0.707    0.000 /usr/local/lib/python3.10/dist-packages/ase/utils/arraywrapper.py:66(attr)
    62509    0.235    0.000    0.418    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:159(_commonType)
   125018    0.129    0.000    0.177    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:135(_makearray)
    62551    0.071    0.000    0.126    0.000 /usr/local/lib/python3.10/dist-packages/ase/cell.py:230(__array__)
   125018    0.075    0.000    0.107    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:153(_realType)
    62509    0.090    0.000    0.090    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:209(_assert_stacked_square)
    62509    0.088    0.000    0.088    0.000 {method 'astype' of 'numpy.ndarray' objects}
   187551    0.081    0.000    0.081    0.000 {built-in method builtins.getattr}
    62509    0.067    0.000    0.067    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:130(get_linalg_error_extobj)
    62509    0.063    0.000    0.063    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:203(_assert_stacked_2d)
62585/62573    0.056    0.000    0.056    0.000 {built-in method numpy.array}
   125046    0.032    0.000    0.032    0.000 {method 'get' of 'dict' objects}
    62509    0.027    0.000    0.027    0.000 {method '__array_prepare__' of 'numpy.ndarray' objects}
    62509    0.022    0.000    0.022    0.000 /usr/local/lib/python3.10/dist-packages/numpy/linalg/linalg.py:325(_solve_dispatcher)
        1    0.003    0.003    0.011    0.011 /content/kaldo/kaldo/observables/secondorder.py:190(_map2prim)
        2    0.000    0.000    0.004    0.002 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:366(get_distances)
        2    0.000    0.000    0.003    0.002 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:234(conditional_find_mic)
        2    0.000    0.000    0.003    0.002 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:242(<listcomp>)
        2    0.000    0.000    0.003    0.002 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:203(find_mic)
        2    0.001    0.000    0.002    0.001 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:170(general_find_mic)
      254    0.000    0.000    0.002    0.000 /usr/local/lib/python3.10/dist-packages/ase/atoms.py:1096(__iter__)
      252    0.001    0.000    0.002    0.000 /usr/local/lib/python3.10/dist-packages/ase/atoms.py:1100(__getitem__)
      300    0.002    0.000    0.002    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        6    0.001    0.000    0.001    0.000 /usr/local/lib/python3.10/dist-packages/ase/geometry/geometry.py:36(wrap_positions)
        4    0.000    0.000    0.001    0.000 /usr/local/lib/python3.10/dist-packages/ase/atoms.py:1913(wrap)
       ...

@lattice737
Copy link
Author

lattice737 commented Oct 19, 2024

Development Plan

On preliminary inspection, these loops may be calling the norm function more than they would be in a vectorized implementation:

fc_out = np.zeros((n_sc_new, n_sc_new, 3, 3))

# outer loop: indexes ith atom in map2prim from new supercell positions
for a1, r0 in enumerate(new_supercell.positions):
    uc_index = map2prim[a1]

    # first inner loop: setup r_pair from uc_index-th sc_r element
    for sc_a2, sc_r2 in enumerate(sc_r[uc_index]):
        
        r_pair = r0 + sc_r2
        r_pair = np.linalg.solve(sc_temp.T, r_pair.T).T % 1.0
        
        # second inner loop: setup r_diff for to evaluate norm condition
        for a2 in range(n_sc_new):

            r_diff = np.abs(r_pair - ref_struct_pos[a2])
            r_diff -= np.floor(r_diff + eps)
            
            # FIXME the objective is to vectorize any/all of these loops to
            #       reduce the number of individual np.linalg.norm() calls,
            #       which can be huge in three nested loops; ideally, a single
            #       vectorized norm call would allow us to mask fc_out and
            #       force_constants for the conditional sum
            if np.linalg.norm(r_diff) < tol:
                fc_out[a1, a2, :, :] += force_constants[uc_index, sc_a2, :, :]

I intend to reimplement to something more like:

fc_out = np.zeros((n_sc_new, n_sc_new, 3, 3))

for a1, r0 in enumerate(new_supercell.positions):
    uc_index = map2prim[a1]

    # vectorized first loop
    r_pair = sc_r[uc_index,:] + r0
    solved_r_pair = np.linalg.solve(sc_temp.T, r_pair.T).T % 1.0

    # vectorized second loop
    r_diff = np.abs(solved_r_pair - ref_struct_pos)
    r_diff -= np.floor(r_diff + eps)

    # conditional mask
    r_diff_norms = np.linalg.norm(r_diff, axis=1)
    below_tolerance = np.where(r_diff_norms < tol)

    # FIXME by vectorizing, we lose the explicit indices; however, this should
    #       be encoded implicitly in r_diff (or we can keep a master index array):
    #
    #       {a2, sc_a2} |-> filtered_sc_r[:,0], filtered_sc_r[:,1] ?

    # update force constant matrix
    # >>> fc_out[a1, a2, :, :] += force_constants[uc_index, sc_a2, :, :]
    # [x] a1
    # [ ] a2
    # [x] uc_index
    # [ ] sc_a2
    filtered_sc_r = sc_r[uc_index][below_tolerance]
    fc_out[a1, filtered_sc_r[:,0], :, :] += force_constants[uc_index, filtered_sc_r[:,1], :, :]

which, in principle, may not necessarily yield significant dividends unless the data is uniformly columnar (I am currently learning about numerical methods in one of my courses this quarter, so I'm not yet sure if this is always a good assumption), but it will reduce the number of norm calls, which may allow numpy the opportunity to norm more efficiently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant