You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you very much for developing NequIP.
Though I can do training without problem (with GPU), I got error when running the model in LAMMPS.
The error said terminate called after throwing an instance of 'c10::Error' what(): expected scalar type Float but found Byte which probably related to #25 (comment).
I have used Pytorch 1.11 and LAMMPS 29 Sep as suggested.
I've tried to use libtorch 1.11 instead of pytorch but the same error occured.
I installed NequIP 0.5.5 with Pytorch 1.11
I put the output below.
LAMMPS (29 Sep 2021 - Update 2)
using 1 OpenMP thread(s) per MPI task
Reading data file ...
orthogonal box = (0.0000000 0.0000000 0.0000000) to (30.000000 30.000000 30.000000)
1 by 1 by 1 MPI processor grid
reading atoms ...
21 atoms
read_data CPU = 0.001 seconds
NEQUIP is using device cuda
NequIP Coeff: type 1 is element H
NequIP Coeff: type 2 is element O
NequIP Coeff: type 3 is element C
Loading model from aspirin.pth
Freezing TorchScript model...
WARNING: Using 'neigh_modify every 1 delay 0 check yes' setting during minimization (src/min.cpp:188)
Neighbor list info ...
update every 1 steps, delay 0 steps, check yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 5
ghost atom cutoff = 5
binsize = 2.5, bins = 12 12 12
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair nequip, perpetual
attributes: full, newton off
pair build: full/bin/atomonly
stencil: full/bin/3d
bin: standard
Setting up cg style minimization ...
Unit style : real
Current step : 0
terminate called after throwing an instance of 'c10::Error'
what(): expected scalar type Float but found Byte
Exception raised from data_ptr<float> at /opt/conda/conda-bld/pytorch_1646755903507/work/build/aten/src/ATen/core/TensorMethods.cpp:18 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x14d0984b31bd in /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x68 (0x14d0984af838 in /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: float* at::TensorBase::data_ptr<float>() const + 0xde (0x14d09a3abc3e in /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #3: at::TensorAccessor<float, 2ul, at::DefaultPtrTraits, long> at::TensorBase::accessor<float, 2ul>() const & + 0xcb (0x8bea4b in ./lmp)
frame #4: ./lmp() [0x8b66b2]
frame #5: ./lmp() [0x477689]
frame #6: ./lmp() [0x47be8e]
frame #7: ./lmp() [0x439995]
frame #8: ./lmp() [0x43799b]
frame #9: ./lmp() [0x41a416]
frame #10: __libc_start_main + 0xf3 (0x14d063f84493 in /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libc.so.6)
frame #11: ./lmp() [0x41a2ee]
[acc008:691367] *** Process received signal ***
[acc008:691367] Signal: Aborted (6)
[acc008:691367] Signal code: (-6)
[acc008:691367] [ 0] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libpthread.so.0(+0x12c20)[0x14d0649dac20]
[acc008:691367] [ 1] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libc.so.6(gsignal+0x10f)[0x14d063f9837f]
[acc008:691367] [ 2] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libc.so.6(abort+0x127)[0x14d063f82db5]
[acc008:691367] [ 3] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libstdc++.so.6(+0x9009b)[0x14d06597a09b]
[acc008:691367] [ 4] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libstdc++.so.6(+0x9653c)[0x14d06598053c]
[acc008:691367] [ 5] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libstdc++.so.6(+0x96597)[0x14d065980597]
[acc008:691367] [ 6] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libstdc++.so.6(+0x967f8)[0x14d0659807f8]
[acc008:691367] [ 7] /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/lib/libc10.so(_ZN3c106detail14torchCheckFailEPKcS2_jRKSs+0x93)[0x14d0984af863]
[acc008:691367] [ 8] /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so(_ZNK2at10TensorBase8data_ptrIfEEPT_v+0xde)[0x14d09a3abc3e]
[acc008:691367] [ 9] ./lmp(_ZNKR2at10TensorBase8accessorIfLm2EEENS_14TensorAccessorIT_XT0_ENS_16DefaultPtrTraitsElEEv+0xcb)[0x8bea4b]
[acc008:691367] [10] ./lmp[0x8b66b2]
[acc008:691367] [11] ./lmp[0x477689]
[acc008:691367] [12] ./lmp[0x47be8e]
[acc008:691367] [13] ./lmp[0x439995]
[acc008:691367] [14] ./lmp[0x43799b]
[acc008:691367] [15] ./lmp[0x41a416]
[acc008:691367] [16] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libc.so.6(__libc_start_main+0xf3)[0x14d063f84493]
[acc008:691367] [17] ./lmp[0x41a2ee]
[acc008:691367] *** End of error message ***
Aborted (core dumped)
Curiously, when I compile LAMMPS with Pytorch 1.12 (CPU only) the MD can run successfully.
I'd appreciate it if you have any suggestion to solve this problem.
Below are more details on the system that I experiment with. I'm sorry for the lengthy message.
System: I use minimal.yaml as NequIP input which can be found in NequIP source directory. Then I deploy the model using nequip-deploy to get .pth file which then I use in LAMMPS.
Computer: NVIDIA A100 with CUDA 11.6 loaded
I install pytorch through the following command: conda install pytorch==1.11.0 cudatoolkit=11.3 -c pytorch
-- The CXX compiler identification is NVHPC 22.2.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/app/hpc_sdk/Linux_x86_64/22.2/compilers/bin/nvc++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /home/k0107/k010716/bin/git (found version "2.27.0")
-- Appending /home/app/openmpi/4.1.2/lib to CMAKE_LIBRARY_PATH: /home/app/openmpi/4.1.2/lib
-- Running check for auto-generated files from make-based build system
-- Found MPI_CXX: /home/app/openmpi/4.1.2/lib/libmpi.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Looking for C++ include omp.h
-- Looking for C++ include omp.h - found
-- Found OpenMP_CXX: -mp
-- Found OpenMP: TRUE
-- Found JPEG: /usr/lib64/libjpeg.so (found version "62")
-- Found PNG: /usr/lib64/libpng.so (found version "1.6.34")
-- Found ZLIB: /usr/lib64/libz.so (found version "1.2.11")
-- Found GZIP: /bin/gzip
-- Could NOT find FFMPEG (missing: FFMPEG_EXECUTABLE)
-- Looking for C++ include cmath
-- Looking for C++ include cmath - found
-- Generating style headers...
-- Generating package headers...
-- Generating lmpinstalledpkgs.h...
-- Could NOT find ClangFormat (missing: ClangFormat_EXECUTABLE) (Required is at least version "8.0")
-- The following tools and libraries have been found and configured:
* Git
* MPI
* OpenMP
* JPEG
* PNG
* ZLIB
-- <<< Build configuration >>>
Operating System: Linux Red Hat Enterprise Linux 8.5
Build type: RelWithDebInfo
Install path: /home/k0107/k010716/.local
Generator: Unix Makefiles using /bin/gmake
-- Enabled packages: <None>
-- <<< Compilers and Flags: >>>
-- C++ Compiler: /home/app/hpc_sdk/Linux_x86_64/22.2/compilers/bin/nvc++
Type: NVHPC
Version: 22.2.0
C++ Flags: -O2 -gopt
Defines: LAMMPS_SMALLBIG;LAMMPS_MEMALIGN=64;LAMMPS_OMP_COMPAT=4;LAMMPS_JPEG;LAMMPS_PNG;LAMMPS_GZIP
-- <<< Linker flags: >>>
-- Executable name: lmp
-- Static library flags:
-- <<< MPI flags >>>
-- MPI_defines: MPICH_SKIP_MPICXX;OMPI_SKIP_MPICXX;_MPICC_H
-- MPI includes: /home/app/openmpi/4.1.2/include
-- MPI libraries: /home/app/openmpi/4.1.2/lib/libmpi.so;
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found CUDA: /home/k0107/k010716/GPU/cuda/ (found version "11.6")
-- The CUDA compiler identification is NVIDIA 11.6.55
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /home/app/hpc_sdk/Linux_x86_64/22.2/compilers/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Caffe2: CUDA detected: 11.6
-- Caffe2: CUDA nvcc is: /home/k0107/k010716/GPU/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /home/k0107/k010716/GPU/cuda/
-- Caffe2: Header version is: 11.6
-- Found CUDNN: /home/k0107/k010716/GPU/cudnn/lib/libcudnn.so
-- Found cuDNN: v8.5.0 (include: /home/k0107/k010716/GPU/cudnn/include, library: /home/k0107/k010716/GPU/cudnn/lib/libcudnn.so)
-- /home/k0107/k010716/GPU/cuda/lib64/libnvrtc.so shorthash is 280a23f6
-- Autodetected CUDA architecture(s): 8.0 8.0 8.0 8.0
-- Added CUDA NVCC flags for: -gencode;arch=compute_80,code=sm_80
CMake Warning at /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
CMakeLists.txt:922 (find_package)
-- Found Torch: /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/lib/libtorch.so
-- Configuring done
-- Generating done
-- Build files have been written to: /home/k0107/k010716/LAMMPS/lammps-nequip4/build
After cmake then I do make and get executable though some warnings are printed:
"/home/k0107/k010716/LAMMPS/lammps-nequip4/src/fmt/format.h", line 1156: warning: statement is unreachable
return;
^
detected during:
instantiation of "void fmt::v7_lmp::detail::specs_setter<Char>::on_fill(fmt::v7_lmp::basic_string_view<Char>) [with Char=char]" at line 2823
instantiation of "const Char *fmt::v7_lmp::detail::parse_align(const Char *, const Char *, Handler &&) [with Char=char, Handler=fmt::v7_lmp::detail::specs_checker<fmt::v7_lmp::detail::specs_handler<fmt::v7_lmp::basic_format_parse_context<char, fmt::v7_lmp::detail::error_handler>, fmt::v7_lmp::buffer_context<char>>> &]" at line 2883
instantiation of "const Char *fmt::v7_lmp::detail::parse_format_specs(const Char *, const Char *, SpecHandler &&) [with Char=char, SpecHandler=fmt::v7_lmp::detail::specs_checker<fmt::v7_lmp::detail::specs_handler<fmt::v7_lmp::basic_format_parse_context<char, fmt::v7_lmp::detail::error_handler>, fmt::v7_lmp::buffer_context<char>>> &]" at line 3099
instantiation of "const Char *fmt::v7_lmp::detail::format_handler<OutputIt, Char, Context>::on_format_specs(int, const Char *, const Char *) [with OutputIt=fmt::v7_lmp::detail::buffer_appender<char>, Char=char, Context=fmt::v7_lmp::buffer_context<char>]" at line 2975
instantiation of "const Char *fmt::v7_lmp::detail::parse_replacement_field(const Char *, const Char *, Handler &&) [with Char=char, Handler=fmt::v7_lmp::detail::format_handler<fmt::v7_lmp::detail::buffer_appender<char>, char, fmt::v7_lmp::buffer_context<char>> &]" at line 2997
instantiation of "void fmt::v7_lmp::detail::parse_format_string<IS_CONSTEXPR,Char,Handler>(fmt::v7_lmp::basic_string_view<Char>, Handler &&) [with IS_CONSTEXPR=false, Char=char, Handler=fmt::v7_lmp::detail::format_handler<fmt::v7_lmp::detail::buffer_appender<char>, char, fmt::v7_lmp::buffer_context<char>> &]" at line 3776
instantiation of "void fmt::v7_lmp::detail::vformat_to(fmt::v7_lmp::detail::buffer<Char> &, fmt::v7_lmp::basic_string_view<Char>, fmt::v7_lmp::basic_format_args<fmt::v7_lmp::basic_format_context<fmt::v7_lmp::detail::buffer_appender<fmt::v7_lmp::type_identity_t<Char>>, fmt::v7_lmp::type_identity_t<Char>>>, fmt::v7_lmp::detail::locale_ref) [with Char=char]" at line 2752 of "/home/k0107/k010716/LAMMPS/lammps-nequip4/src/fmt/format-inl.h"
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/c10/core/TensorImpl.h", line 1669: warning: unknown attribute "fallthrough"
C10_FALLTHROUGH;
^
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/c10/core/TensorImpl.h", line 1669: warning: unknown attribute "fallthrough"
C10_FALLTHROUGH;
^
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/ATen/core/ivalue_inl.h", line 296: warning: unknown attribute "fallthrough"
C10_FALLTHROUGH;
^
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/ATen/core/ivalue_inl.h", line 299: warning: unknown attribute "fallthrough"
C10_FALLTHROUGH;
^
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/ATen/core/ivalue_inl.h", line 296: warning: unknown attribute "fallthrough"
C10_FALLTHROUGH;
^
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/ATen/core/ivalue_inl.h", line 299: warning: unknown attribute "fallthrough"
C10_FALLTHROUGH;
^
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/ordered_dict.h", line 360: warning: missing return statement at end of non-void function "torch::OrderedDict<Key, Value>::operator[](const Key &)"
}
^
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/ordered_dict.h", line 368: warning: missing return statement at end of non-void function "torch::OrderedDict<Key, Value>::operator[](const Key &) const"
}
^
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/ordered_dict.h", line 360: warning: missing return statement at end of non-void function "torch::OrderedDict<Key, Value>::operator[](const Key &)"
}
^
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/ordered_dict.h", line 368: warning: missing return statement at end of non-void function "torch::OrderedDict<Key, Value>::operator[](const Key &) const"
}
^
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/c10/core/TensorImpl.h", line 1669: warning: unknown attribute "fallthrough"
C10_FALLTHROUGH;
^
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/ATen/core/ivalue_inl.h", line 296: warning: unknown attribute "fallthrough"
C10_FALLTHROUGH;
^
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/ATen/core/ivalue_inl.h", line 299: warning: unknown attribute "fallthrough"
C10_FALLTHROUGH;
^
"/home/k0107/k010716/LAMMPS/lammps-nequip4/src/pair_nequip.cpp", line 390: warning: variable "jtype" was declared but never referenced
int jtype = type[j];
^
"/home/k0107/k010716/LAMMPS/lammps-nequip4/src/pair_nequip.cpp", line 382: warning: variable "itype" was declared but never referenced
int itype = type[i];
^
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/ordered_dict.h", line 360: warning: missing return statement at end of non-void function "torch::OrderedDict<Key, Value>::operator[](const Key &)"
}
^
"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/ordered_dict.h", line 368: warning: missing return statement at end of non-void function "torch::OrderedDict<Key, Value>::operator[](const Key &) const"
}
^
Best regards,
The text was updated successfully, but these errors were encountered:
Hi, thank you very much for developing NequIP.
Though I can do training without problem (with GPU), I got error when running the model in LAMMPS.
The error said
terminate called after throwing an instance of 'c10::Error' what(): expected scalar type Float but found Byte
which probably related to #25 (comment).I have used Pytorch 1.11 and LAMMPS 29 Sep as suggested.
I've tried to use
libtorch 1.11
instead of pytorch but the same error occured.I installed
NequIP 0.5.5
with Pytorch 1.11I put the output below.
Curiously, when I compile LAMMPS with Pytorch 1.12 (CPU only) the MD can run successfully.
I'd appreciate it if you have any suggestion to solve this problem.
Below are more details on the system that I experiment with. I'm sorry for the lengthy message.
minimal.yaml
as NequIP input which can be found in NequIP source directory. Then I deploy the model usingnequip-deploy
to get.pth
file which then I use in LAMMPS.CUDA 11.6
loadedconda install pytorch==1.11.0 cudatoolkit=11.3 -c pytorch
conda list
for the environment that I use:cmake
then I domake
and get executable though some warnings are printed:Best regards,
The text was updated successfully, but these errors were encountered: