Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: expected scalar type Double but found Float #48

Open
peterTHC opened this issue Jul 2, 2024 · 5 comments
Open

Exception: expected scalar type Double but found Float #48

peterTHC opened this issue Jul 2, 2024 · 5 comments

Comments

@peterTHC
Copy link

peterTHC commented Jul 2, 2024

Hi,

I was trying to use pair_allegro to run a simulation on LAMMPS. I have trained a model with 256 water molecules (trained with forces and energy). After I compiled my LAMMPS, I got this error and not able to run. I had an older version of pair_allegro and that works (seems like a file about a year ago). Do you know if there is any output type from my torch that is incorrect? (I am using CUDA 11.6 and torch 1.12.0.)

Here's the full error I got from LAMMPS:
Exception: expected scalar type Double but found Float
Exception raised from data_ptr at aten/src/ATen/core/TensorMethods.cpp:20 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x2ab7c7d06612 in /scratch/user/peterchao1/.conda/envs/allegro/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x2ab7c7d02cab in /scratch/user/peterchao1/.conda/envs/allegro/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: double* at::TensorBase::data_ptr() const + 0x108 (0x2ab7815981c8 in /scratch/user/peterchao1/.conda/envs/allegro/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #3: at::TensorAccessor<double, 2ul, at::DefaultPtrTraits, long> at::TensorBase::accessor<double, 2ul>() const & + 0x7d (0x6ab8ad in ../../lammps/build/lmp)
frame #4: ../../lammps/build/lmp() [0x925095]
frame #5: ../../lammps/build/lmp() [0x4ec0f6]
frame #6: ../../lammps/build/lmp() [0x4ee627]
frame #7: ../../lammps/build/lmp() [0x48b4f0]
frame #8: ../../lammps/build/lmp() [0x48b886]
frame #9: ../../lammps/build/lmp() [0x47acbd]
frame #10: __libc_start_main + 0xf5 (0x2ab7c819a555 in /lib64/libc.so.6)
frame #11: ../../lammps/build/lmp() [0x47c8d0]

If you need any further information, please let me know.

Thank you!

@Linux-cpp-lisp
Copy link
Collaborator

If your model is older, you may need to use pair_style allegro3232.

I recommend upgrading to the most recent nequip (0.6.0) moving forward however.

@R-applet
Copy link

R-applet commented Aug 2, 2024

Hello, I want to comment because I am noticing similar issues. I tried upgrading nequip (0.6.1) and retraining the model, however, I still need to use pair_style allegro3232 in the lammps script to not get the error reported above. To be clear I have trained the allegro model with nequip (0.6.1) with StressForceOutput. When i run NPT with pair_style allegro3232 the simulation runs and the pressure outputs are reasonable. Not sure why I need to use pair_style allegro3232 with an updated model.

@Linux-cpp-lisp
Copy link
Collaborator

Please ensure that you have default_dtype: float64 in your config files when training with the latest nequip.

@iansteeg
Copy link

iansteeg commented Nov 6, 2024

Hi,
I experience the same exception:

Exception: expected scalar type Double but found Float
Exception raised from check_type at aten/src/ATen/core/TensorMethods.cpp:12 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xb0 (0x7dfd9f194950 in /opt/pytorch/libtorch_2.5.1+cu124/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x7dfd9f13844a in /opt/pytorch/libtorch_2.5.1+cu124/lib/libc10.so)
frame #2: <unknown function> + 0x396aa59 (0x7dfd78d6aa59 in /opt/pytorch/libtorch_2.5.1+cu124/lib/libtorch_cpu.so)
frame #3: double* at::TensorBase::mutable_data_ptr<double>() const + 0x43 (0x7dfd78d6bd13 in /opt/pytorch/libtorch_2.5.1+cu124/lib/libtorch_cpu.so)
frame #4: at::TensorAccessor<double, 2ul, at::DefaultPtrTraits, long> at::TensorBase::accessor<double, 2ul>() const & + 0x4b (0x6496d9991a8b in /usr/local/bin/lmp_allegro_hf)
frame #5: <unknown function> + 0xf6a3f3 (0x6496d99993f3 in /usr/local/bin/lmp_allegro_hf)
frame #6: <unknown function> + 0x501782 (0x6496d8f30782 in /usr/local/bin/lmp_allegro_hf)
frame #7: <unknown function> + 0x46c889 (0x6496d8e9b889 in /usr/local/bin/lmp_allegro_hf)
frame #8: <unknown function> + 0x34254c (0x6496d8d7154c in /usr/local/bin/lmp_allegro_hf)
frame #9: <unknown function> + 0x342fbf (0x6496d8d71fbf in /usr/local/bin/lmp_allegro_hf)
frame #10: <unknown function> + 0x297001 (0x6496d8cc6001 in /usr/local/bin/lmp_allegro_hf)
frame #11: <unknown function> + 0x2a1ca (0x7dfd16e2a1ca in /lib/x86_64-linux-gnu/libc.so.6)
frame #12: __libc_start_main + 0x8b (0x7dfd16e2a28b in /lib/x86_64-linux-gnu/libc.so.6)
frame #13: <unknown function> + 0x335615 (0x6496d8d64615 in /usr/local/bin/lmp_allegro_hf)

I used the latest allegro and nequip versions on their develop branches to train/deploy the model. pair_allegro is on branch nequip. I can run lammps with this pair style using models I trained with an earlier version of allegro.

The configuration file looks like this:

...
training_module:
  ... 
  model:
    ...
    model_dtype: float32
    default_dtype: float64
...
global_options:
  allow_tf32: true

I tried setting allow_tf32 to true and false and also tried model_dtype: float64.
Am I missing some settings, or are the allegro/nequip develop branches incompatible with the pair style right now?

@Linux-cpp-lisp
Copy link
Collaborator

With those versions, you'll need to use pair_style allegro6464.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants