Exception: expected scalar type Double but found Float #48

peterTHC · 2024-07-02T16:34:18Z

Hi,

I was trying to use pair_allegro to run a simulation on LAMMPS. I have trained a model with 256 water molecules (trained with forces and energy). After I compiled my LAMMPS, I got this error and not able to run. I had an older version of pair_allegro and that works (seems like a file about a year ago). Do you know if there is any output type from my torch that is incorrect? (I am using CUDA 11.6 and torch 1.12.0.)

Here's the full error I got from LAMMPS:
Exception: expected scalar type Double but found Float
Exception raised from data_ptr at aten/src/ATen/core/TensorMethods.cpp:20 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x2ab7c7d06612 in /scratch/user/peterchao1/.conda/envs/allegro/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x2ab7c7d02cab in /scratch/user/peterchao1/.conda/envs/allegro/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: double* at::TensorBase::data_ptr() const + 0x108 (0x2ab7815981c8 in /scratch/user/peterchao1/.conda/envs/allegro/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #3: at::TensorAccessor<double, 2ul, at::DefaultPtrTraits, long> at::TensorBase::accessor<double, 2ul>() const & + 0x7d (0x6ab8ad in ../../lammps/build/lmp)
frame #4: ../../lammps/build/lmp() [0x925095]
frame #5: ../../lammps/build/lmp() [0x4ec0f6]
frame #6: ../../lammps/build/lmp() [0x4ee627]
frame #7: ../../lammps/build/lmp() [0x48b4f0]
frame #8: ../../lammps/build/lmp() [0x48b886]
frame #9: ../../lammps/build/lmp() [0x47acbd]
frame #10: __libc_start_main + 0xf5 (0x2ab7c819a555 in /lib64/libc.so.6)
frame #11: ../../lammps/build/lmp() [0x47c8d0]

If you need any further information, please let me know.

Thank you!

Linux-cpp-lisp · 2024-07-02T18:20:08Z

If your model is older, you may need to use pair_style allegro3232.

I recommend upgrading to the most recent nequip (0.6.0) moving forward however.

R-applet · 2024-08-02T16:46:14Z

Hello, I want to comment because I am noticing similar issues. I tried upgrading nequip (0.6.1) and retraining the model, however, I still need to use pair_style allegro3232 in the lammps script to not get the error reported above. To be clear I have trained the allegro model with nequip (0.6.1) with StressForceOutput. When i run NPT with pair_style allegro3232 the simulation runs and the pressure outputs are reasonable. Not sure why I need to use pair_style allegro3232 with an updated model.

Linux-cpp-lisp · 2024-08-02T22:02:17Z

Please ensure that you have default_dtype: float64 in your config files when training with the latest nequip.

iansteeg · 2024-11-06T15:01:47Z

Hi,
I experience the same exception:

Exception: expected scalar type Double but found Float
Exception raised from check_type at aten/src/ATen/core/TensorMethods.cpp:12 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xb0 (0x7dfd9f194950 in /opt/pytorch/libtorch_2.5.1+cu124/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x7dfd9f13844a in /opt/pytorch/libtorch_2.5.1+cu124/lib/libc10.so)
frame #2: <unknown function> + 0x396aa59 (0x7dfd78d6aa59 in /opt/pytorch/libtorch_2.5.1+cu124/lib/libtorch_cpu.so)
frame #3: double* at::TensorBase::mutable_data_ptr<double>() const + 0x43 (0x7dfd78d6bd13 in /opt/pytorch/libtorch_2.5.1+cu124/lib/libtorch_cpu.so)
frame #4: at::TensorAccessor<double, 2ul, at::DefaultPtrTraits, long> at::TensorBase::accessor<double, 2ul>() const & + 0x4b (0x6496d9991a8b in /usr/local/bin/lmp_allegro_hf)
frame #5: <unknown function> + 0xf6a3f3 (0x6496d99993f3 in /usr/local/bin/lmp_allegro_hf)
frame #6: <unknown function> + 0x501782 (0x6496d8f30782 in /usr/local/bin/lmp_allegro_hf)
frame #7: <unknown function> + 0x46c889 (0x6496d8e9b889 in /usr/local/bin/lmp_allegro_hf)
frame #8: <unknown function> + 0x34254c (0x6496d8d7154c in /usr/local/bin/lmp_allegro_hf)
frame #9: <unknown function> + 0x342fbf (0x6496d8d71fbf in /usr/local/bin/lmp_allegro_hf)
frame #10: <unknown function> + 0x297001 (0x6496d8cc6001 in /usr/local/bin/lmp_allegro_hf)
frame #11: <unknown function> + 0x2a1ca (0x7dfd16e2a1ca in /lib/x86_64-linux-gnu/libc.so.6)
frame #12: __libc_start_main + 0x8b (0x7dfd16e2a28b in /lib/x86_64-linux-gnu/libc.so.6)
frame #13: <unknown function> + 0x335615 (0x6496d8d64615 in /usr/local/bin/lmp_allegro_hf)

I used the latest allegro and nequip versions on their develop branches to train/deploy the model. pair_allegro is on branch nequip. I can run lammps with this pair style using models I trained with an earlier version of allegro.

The configuration file looks like this:

...
training_module:
  ... 
  model:
    ...
    model_dtype: float32
    default_dtype: float64
...
global_options:
  allow_tf32: true

I tried setting allow_tf32 to true and false and also tried model_dtype: float64.
Am I missing some settings, or are the allegro/nequip develop branches incompatible with the pair style right now?

Linux-cpp-lisp · 2024-11-06T19:59:24Z

With those versions, you'll need to use pair_style allegro6464.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception: expected scalar type Double but found Float #48

Exception: expected scalar type Double but found Float #48

peterTHC commented Jul 2, 2024

Linux-cpp-lisp commented Jul 2, 2024

R-applet commented Aug 2, 2024

Linux-cpp-lisp commented Aug 2, 2024

iansteeg commented Nov 6, 2024

Linux-cpp-lisp commented Nov 6, 2024

Exception: expected scalar type Double but found Float #48

Exception: expected scalar type Double but found Float #48

Comments

peterTHC commented Jul 2, 2024

Linux-cpp-lisp commented Jul 2, 2024

R-applet commented Aug 2, 2024

Linux-cpp-lisp commented Aug 2, 2024

iansteeg commented Nov 6, 2024

Linux-cpp-lisp commented Nov 6, 2024