Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating virial stress in lammps #42

Open
Tawfiq1448 opened this issue May 15, 2024 · 11 comments
Open

Calculating virial stress in lammps #42

Tawfiq1448 opened this issue May 15, 2024 · 11 comments

Comments

@Tawfiq1448
Copy link

Hi,

I am trying to calculate virial stress in lammps using pair_allegro. I have trained the model with stress. I am using the develop branch of Nequip and the main branch of Allegro. For pair_allegro, I am using the stress branch. I am encountering the following error:

ERROR: Pair style Allegro does not support per-atom virial

I have compiled lammps with pytorch version 1.11.0. Please, let me know if you have any suggestions. Thanks.

@Linux-cpp-lisp
Copy link
Collaborator

Hi @Tawfiq1448 ,

We do not support per-atom virials; likely your LAMMPS script is unnecessarily requesting them, somehow. (They are not necessary to run NPT simulations.)

@Tawfiq1448
Copy link
Author

Thanks for your reply. I am also trying to implement Green-Kubo formulation for my system. For that, I need per-atom stress.
https://docs.lammps.org/compute_heat_flux.html
It is good to know that currently, allegro does not support per-atom virials. Thanks.

@Hongyu-yu
Copy link

Thanks for your reply. I am also trying to implement Green-Kubo formulation for my system. For that, I need per-atom stress. https://docs.lammps.org/compute_heat_flux.html It is good to know that currently, allegro does not support per-atom virials. Thanks.

You may try this repo which supports per-atom virials with ParaStressForceOutput: https://github.com/koheishimamura/nequip_allegro_tc with details in http://arxiv.org/abs/2403.14130 and https://journals.aps.org/prb/abstract/10.1103/PhysRevB.109.144426. Hope it helps.

@Tawfiq1448
Copy link
Author

@Hongyu-yu Thanks for your response. Which branch of pair_allegro do I need to use for my lammps calculations? Can I retrain my model trained with StressForceOutput using ParaStressForceOutput?

@Hongyu-yu
Copy link

@Tawfiq1448 You can try with https://github.com/Hongyu-yu/pair_allegro or just add related lines in you pair_allegro.cpp with https://github.com/Hongyu-yu/pair_allegro/blob/1061c6ec414b8cf597ceb54387b86f1e8678a088/pair_allegro.cpp#L486-L503
You don't have to retrain. Just

  • install nequip in https://github.com/koheishimamura/nequip_allegro_tc
  • replace the StressForceOutput with ParaStressForceOutput in config.yaml file in your results folder
  • deploy again
    Then run lammps with modified pair_allegro and new deployed pth file.
    Hope it works for you!

@Tawfiq1448
Copy link
Author

Tawfiq1448 commented Jun 4, 2024

Hi @Hongyu-yu ,
I have followed the steps you provided. I have installed nequip and allegro from https://github.com/koheishimamura/nequip_allegro_tc and compiled lammps with pair_allegro from https://github.com/Hongyu-yu/pair_allegro/blob/1061c6ec414b8cf597ceb54387b86f1e8678a088/pair_allegro.cpp#L486-L503

My simulation crashes with a segmentation fault error:

Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 6 with PID 34294 on node ccc0207 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[ccc0207.campuscluster.illinois.edu:34284] 31 more processes have sent help message help-mpi-btl-openib.txt / no device params found
[ccc0207.campuscluster.illinois.edu:34284] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[ccc0207.campuscluster.illinois.edu:34284] 15 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port

I have torch version 1.11.0+cu113 in my environment. The lammps branch I cloned is stable_2Aug2023. I also tried it with the latest develop branch of lammps. Please, let me know if you know any solution to this. Thanks.

@Hongyu-yu
Copy link

@Tawfiq1448 I haven't encountered this kind of error, maybe you can try this version of lammps “stable_29Sep2021_update2”.

@baoanh13
Copy link

https://github.com/Hongyu-yu/pair_allegro

Hello, I also need per-atom stress so I am trying to follow above steps.

With the latest version of LAMMPS (git clone --depth=1 https://github.com/lammps/lammps),
I can compile without problem but when adding
"compute sv all stress/atom NULL"
to the lammps input file, the similar segmentation fault appeared...

Therefore I tried: git clone -b stable_29Sep2021_update2 --depth 1 https://github.com/lammps/lammps.git
but it seems there are some conflicts in pair_allegro.cpp that I took snapshoot below, which caused the fail in compiling LAMMPS with that version:
image

Do you have any solution for this problem?

Thank you in advance.

@rbjiawen
Copy link

rbjiawen commented Nov 4, 2024

@Tawfiq1448 You can try with https://github.com/Hongyu-yu/pair_allegro or just add related lines in you pair_allegro.cpp with https://github.com/Hongyu-yu/pair_allegro/blob/1061c6ec414b8cf597ceb54387b86f1e8678a088/pair_allegro.cpp#L486-L503 You don't have to retrain. Just

  • install nequip in https://github.com/koheishimamura/nequip_allegro_tc
  • replace the StressForceOutput with ParaStressForceOutput in config.yaml file in your results folder
  • deploy again
    Then run lammps with modified pair_allegro and new deployed pth file.
    Hope it works for you!

Hi, @Hongyu-yu . I encountered the following problems when deploying the trained model using the PR 0.6.0 version of nequip:

INFO:root:Loading best_model from training session...
/public/home/jwcao/anaconda3/envs/allegro-lammps/lib/python3.8/site-packages/nequip/nn/_grad_output.py:399: UserWarning: !! Stresses in NequIP are in BETA and UNDER DEVELOPMENT: _please_ carefully check the sanity of your results and report any (potential) issues on the GitHub
  warnings.warn(
Traceback (most recent call last):
  File "/public/home/jwcao/anaconda3/envs/allegro-lammps/bin/nequip-deploy", line 8, in <module>
    sys.exit(main())
  File "/public/home/jwcao/anaconda3/envs/allegro-lammps/lib/python3.8/site-packages/nequip/scripts/deploy.py", line 237, in main
    model, _ = Trainer.load_model_from_training_session(
  File "/public/home/jwcao/anaconda3/envs/allegro-lammps/lib/python3.8/site-packages/nequip/train/trainer.py", line 699, in load_model_from_training_session
    model.load_state_dict(model_state_dict)
  File "/public/home/jwcao/anaconda3/envs/allegro-lammps/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GraphModel:
        size mismatch for model.model.func.per_species_rescale.scales: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([3]).

I'm not sure if it's because of my torch version? I use two conda environments that are exactly the same except for nequip (one is PR 0.6.0 nequip (PR) for deploy, the other is the normal 0.6.0 version nequip (mir) for train,because it seems to be getting slower and slower when training with the PR 0.6.0 version of nequip), Here is a list of my conda environments:

# Name Version Build Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
ase                       3.23.0                   pypi_0    pypi
binutils_impl_linux-64    2.43                 h4bf12b8_2    conda-forge
binutils_linux-64         2.43                 h4852527_2    conda-forge
bzip2                     1.0.8                h4bc722e_7    conda-forge
ca-certificates           2024.8.30            hbcca054_0    conda-forge
certifi                   2024.8.30                pypi_0    pypi
cffi                      1.17.0           py38heb5c249_0    conda-forge
charset-normalizer        3.4.0                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
contourpy                 1.1.1                    pypi_0    pypi
cuda-version              11.7                 h67201e3_3    conda-forge
cudatoolkit               11.7.1              h4bc3d14_13    conda-forge
cudnn                     8.9.7.29             hbc23b4c_3    conda-forge
cycler                    0.12.1                   pypi_0    pypi
docker-pycreds            0.4.0                    pypi_0    pypi
e3nn                      0.5.3                    pypi_0    pypi
fonttools                 4.54.1                   pypi_0    pypi
gcc_impl_linux-64         12.4.0               hb2e57f8_1    conda-forge
gcc_linux-64              12.4.0               h6b7512a_5    conda-forge
gitdb                     4.0.11                   pypi_0    pypi
gitpython                 3.1.43                   pypi_0    pypi
gxx_impl_linux-64         12.4.0               h613a52c_1    conda-forge
gxx_linux-64              12.4.0               h8489865_5    conda-forge
idna                      3.10                     pypi_0    pypi
importlib-resources       6.4.5                    pypi_0    pypi
kernel-headers_linux-64   3.10.0              he073ed8_18    conda-forge
kiwisolver                1.4.7                    pypi_0    pypi
ld_impl_linux-64          2.43                 h712a8e2_2    conda-forge
libblas                   3.9.0           25_linux64_openblas    conda-forge
libcblas                  3.9.0           25_linux64_openblas    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc                    14.2.0               h77fa898_1    conda-forge
libgcc-devel_linux-64     12.4.0             ha4f9413_101    conda-forge
libgcc-ng                 14.2.0               h69a702a_1    conda-forge
libgfortran               14.2.0               h69a702a_1    conda-forge
libgfortran-ng            14.2.0               h69a702a_1    conda-forge
libgfortran5              14.2.0               hd5240d6_1    conda-forge
libgomp                   14.2.0               h77fa898_1    conda-forge
libhwloc                  2.11.1          default_hecaa2ac_1000    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
liblapack                 3.9.0           25_linux64_openblas    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.28          pthreads_h94d23a6_0    conda-forge
libprotobuf               3.21.12              hfc55251_2    conda-forge
libsanitizer              12.4.0               h46f95d5_1    conda-forge
libsqlite                 3.47.0               hadc24fc_1    conda-forge
libstdcxx                 14.2.0               hc0a3c3a_1    conda-forge
libstdcxx-devel_linux-64  12.4.0             ha4f9413_101    conda-forge
libstdcxx-ng              14.2.0               h4852527_1    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.13.4               h064dc61_2    conda-forge
libzlib                   1.3.1                hb9d3cd8_2    conda-forge
llvm-openmp               19.1.3               h024ca30_0    conda-forge
magma                     2.6.2                hc72dce7_0    conda-forge
matplotlib                3.7.5                    pypi_0    pypi
mir-allegro               0.2.0                    pypi_0    pypi
mkl                       2022.2.1         h6508926_16999    conda-forge
mpmath                    1.3.0                    pypi_0    pypi
nccl                      2.23.4.1             h03a54cd_2    conda-forge
ncurses                   6.5                  he02047a_1    conda-forge
nequip                    0.6.0                    pypi_0    pypi
ninja                     1.12.1               h297d8ca_0    conda-forge
numpy                     1.24.4           py38h59b608b_0    conda-forge
openssl                   3.3.2                hb9d3cd8_0    conda-forge
opt-einsum                3.4.0                    pypi_0    pypi
opt-einsum-fx             0.1.4                    pypi_0    pypi
packaging                 24.1                     pypi_0    pypi
pillow                    10.4.0                   pypi_0    pypi
pip                       24.3.1             pyh8b19718_0    conda-forge
platformdirs              4.3.6                    pypi_0    pypi
protobuf                  5.28.3                   pypi_0    pypi
psutil                    6.1.0                    pypi_0    pypi
pycparser                 2.22               pyhd8ed1ab_0    conda-forge
pyparsing                 3.1.4                    pypi_0    pypi
python                    3.8.20          h4a871b0_2_cpython    conda-forge
python-dateutil           2.9.0.post0              pypi_0    pypi
python_abi                3.8                      5_cp38    conda-forge
pytorch                   1.13.1          cuda112py38hd94e077_200    conda-forge
pyyaml                    6.0.2                    pypi_0    pypi
readline                  8.2                  h8228510_1    conda-forge
requests                  2.32.3                   pypi_0    pypi
scipy                     1.10.1                   pypi_0    pypi
sentry-sdk                2.17.0                   pypi_0    pypi
setproctitle              1.3.3                    pypi_0    pypi
setuptools                75.3.0             pyhd8ed1ab_0    conda-forge
six                       1.16.0                   pypi_0    pypi
sleef                     3.7                  h1b44611_0    conda-forge
smmap                     5.0.1                    pypi_0    pypi
sympy                     1.13.3                   pypi_0    pypi
sysroot_linux-64          2.17                h4a8ded7_18    conda-forge
tbb                       2021.13.0            h84d6215_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
torch-ema                 0.3                      pypi_0    pypi
torch-runstats            0.2.0                    pypi_0    pypi
tqdm                      4.66.6                   pypi_0    pypi
typing_extensions         4.12.2             pyha770c72_0    conda-forge
tzdata                    2024b                hc8b5060_0    conda-forge
urllib3                   2.2.3                    pypi_0    pypi
wandb                     0.18.5                   pypi_0    pypi
wheel                     0.44.0             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zipp                      3.20.2                   pypi_0    pypi

If possible, could you share with me the version of the dependency package in your environment? Best wishes!

@Linux-cpp-lisp
Copy link
Collaborator

because it seems to be getting slower and slower when training with the PR 0.6.0 version of nequip

@rbjiawen if you think you've reproduced mir-group/nequip#311, please post your code and torch versions in that thread---thanks!

Regarding "Error(s) in loading state_dict for GraphModel", are you trying to reload a model trained on a different version of the code? That could explain the error.

@rbjiawen
Copy link

rbjiawen commented Nov 19, 2024

Hello, I used pytorch=1.13.1 and cudatoolkit=11.7, and the nequip 6.0(https://github.com/koheishimamura/nequip_allegro_tc/tree/main/nequip) is getting slower and slower during training, but the problem is solved when I use pytorch=1.11.1. Regarding the model deploy of PR 0.6.0, I have solved it, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants