Problem running otf_train.yaml (error message sparse_gp.py) #359

johnemec · 2023-06-30T15:24:49Z

Describe the bug
I was running flare-otf otf_train.yaml command using a POSCAR file as input structure file and VASP as DFT calculator, when getting this error message:

File "/home/USER/.local/lib/python3.10/site-packages/flare/bffs/sgp/sparse_gp.py", line 335, in update_db
coded_species.append(self.species_map[spec])
KeyError: 14

To Reproduce
Steps to reproduce the behavior:

otf_train.yaml

Super cell is read from a file such as POSCAR, xyz, lammps-data

or any format that ASE supports

supercell:
file: POSCAR
format: vasp
replicate: [1, 1, 1] # supercell creation. Be mindful of DFT limitations and periodicity of your cell.
jitter: 0.1 # perturb the initial atomic positions by 0.1 A, so initial atomic environments added to the sparse set are not the same

Set up FLARE calculator with (sparse) Gaussian process

flare_calc:
gp: SGP_Wrapper
kernels:
- name: NormalizedDotProduct # select kernel for comparison of atomic environments
sigma: 2.0 # signal variance, this hyperparameter will be trained, and is typically between 1 and 10.
power: 2 # power of the kernel, influences body-order
descriptors:
- name: B2 # Atomic Cluster Expansion (ACE) descriptor from R. Drautz (2019). FLARE can only go from B1 up to B3 currently.
nmax: 8 # Radial fidelity of the descriptor (higher value = higher cost)
lmax: 3 # Angular fidelity of the descriptor (higher value = higher cost)
cutoff_function: quadratic # Cutoff behavior
radial_basis: chebyshev # Formalism for the radial basis functions
cutoff_matrix: [[5.0]] # In angstroms. NxN array for N_species in a system.
energy_noise: 0.096 # Energy noise hyperparameter, will be trained later. Typically set to 1 meV * N_atoms.
forces_noise: 0.05 # Force noise hyperparameter, will be trained later. System dependent, typically between 0.05 meV/A and 0.2 meV/A.
stress_noise: 0.001 # Stress noise hyperparameter, will be trained later. Typically set to 0.001 meV/A^3.
energy_training: True
force_training: True
stress_training: True
species:
- 13 # Atomic number of your species (here, 13 = Al).
single_atom_energies:
- 0 # Single atom energies to bias the energy prediction of the model. Can help in systems with poor initial energy estimations. Length must equal the number of species.
cutoff: 5.0 # Cutoff for the (ACE) descriptor. Typically informed by the radial distribution function of the system. Should equal the maximum value in the cutoff_matrix.
variance_type: local # Calculate atomic uncertainties.
max_iterations: 20 # Maximum steps taken during each hyperparameter optimization call.
use_mapping: True # Print mapped model (ready for use in LAMMPS) during trajectory. Model is re-mapped and replaced if new DFT calls are made throughout the trajectory.

In the tutorial, we use ASE Lennard-Jones potential as ground truth

instead of DFT to save time

dft_calc:
name: Vasp
kwargs:
command: "mpirun vasp_std"
# pseudo-potential
xc: pbe
# k points
kpts: [4, 4, 4]
# INCAR
istart: 0
npar: 8
ediff: 1.0e-6
encut: 500
ismear: -5
sigma: 0.2
lreal: Auto
prec: Accurate
algo: Fast
lscalapack: False
params: {}

Set up On-the-fly training and MD

otf: # On-the-fly training and MD
mode: fresh # Start from an empty SGP
md_engine: VelocityVerlet # Define MD engine, here we use the Velocity Verlet engine from ASE. LAMMPS examples can be found in the flare/examples directory in the repo
md_kwargs: {} # Define MD kwargs
initial_velocity: 1000 # Initialize the velocities
dt: 0.001 # Set the time step in picoseconds (1 fs here)
number_of_steps: 10 # Total number of MD steps to be taken
output_name: Si_otf # Name of output
init_atoms: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] # Initial atoms to be added to the sparse set
std_tolerance_factor: -0.01 # The uncertainty threshold above which the DFT will be called
max_atoms_added: -1 # Allow for all atoms in a given frame to be added to the sparse set if uncertainties permit
train_hyps: [5,inf] # Define range in which hyperparameters will be optimized. Here, hyps are optimized at every DFT call after the 5th call.
write_model: 4 # Verbosity of model output.
update_style: threshold # Sparse set update style. Atoms above a defined "threshold" will be added using this method
update_threshold: 0.001 # Threshold for adding atoms if "update_style = threshold". Threshold represents relative uncertainty to mean atomic uncertainty, where atoms above are added to sparse set
force_only: False # Train on forces, stresses, and energies.

POSCAR

Si
1.0000000000000000
5.4437023729394527 0.0000000000000000 0.0000000000000003
0.0000000000000009 5.4437023729394527 0.0000000000000003
0.0000000000000000 0.0000000000000000 5.4437023729394527
Si
8
Cartesian
4.1147590257602102 4.2452943034050890 1.3468254251531135
-0.0802625722806385 2.7244722198520734 2.8298858824873951
4.0422699347878837 1.3440468765613307 4.0661853445649534
0.0494231634513068 -0.0609311205725576 0.1084518318735196
1.3544414172112398 4.0951079607678640 4.0422043460588535
2.6399530632153270 2.7130357906305003 0.0432045106348295
1.4549697967372583 1.5227723112914378 1.3071579071337061
2.8919919707496526 -0.0405562349093180 2.8027838013455884

version flare

git clone https://github.com/mir-group/flare.git (latest release 1.3.3)

The text was updated successfully, but these errors were encountered:

cjowen1 · 2023-06-30T15:34:35Z

Hello,

The error you are seeing is due to a mismatch in the species listed in the flare_calc section of your yaml and the structure you are reading. You need to modify the following (assuming your input file only contains Si):

#old
species:

13 # Atomic number of your species (here, 13 = Al).

#new
species:

14 # Atomic number of your species (here, 14 = Si).
Cameron

johnemec · 2023-07-03T14:43:47Z

Thank you, it worked! If I have a system with different species (for example Si=14 and O=8), how does my otf_train.yaml looks like (I tried different ways, but always got some error messages)?

Thank you!

To Reproduce

otf_train.yaml

Super cell is read from a file such as POSCAR, xyz, lammps-data

or any format that ASE supports

supercell:
file: POSCAR
format: vasp
replicate: [1, 1, 1] # supercell creation. Be mindful of DFT limitations and periodicity of your cell.
jitter: 0.1 # perturb the initial atomic positions by 0.1 A, so initial atomic environments added to the sparse set are not the same

Set up FLARE calculator with (sparse) Gaussian process

flare_calc:
gp: SGP_Wrapper
kernels:
- name: NormalizedDotProduct # select kernel for comparison of atomic environments
sigma: 2.0 # signal variance, this hyperparameter will be trained, and is typically between 1 and 10.
power: 2 # power of the kernel, influences body-order
descriptors:
- name: B2 # Atomic Cluster Expansion (ACE) descriptor from R. Drautz (2019). FLARE can only go from B1 up to B3 currently.
nmax: 8 # Radial fidelity of the descriptor (higher value = higher cost)
lmax: 3 # Angular fidelity of the descriptor (higher value = higher cost)
cutoff_function: quadratic # Cutoff behavior
radial_basis: chebyshev # Formalism for the radial basis functions
cutoff_matrix: [[5.0]] # In angstroms. NxN array for N_species in a system.
energy_noise: 0.096 # Energy noise hyperparameter, will be trained later. Typically set to 1 meV * N_atoms.
forces_noise: 0.05 # Force noise hyperparameter, will be trained later. System dependent, typically between 0.05 meV/A and 0.2 meV/A.
stress_noise: 0.001 # Stress noise hyperparameter, will be trained later. Typically set to 0.001 meV/A^3.
energy_training: True
force_training: True
stress_training: True
species:
- [14, 8] # Atomic number of your species (here, 13 = Al).
single_atom_energies:
- 0 # Single atom energies to bias the energy prediction of the model. Can help in systems with poor initial energy estimations. Length must equal the number of species.
cutoff: 5.0 # Cutoff for the (ACE) descriptor. Typically informed by the radial distribution function of the system. Should equal the maximum value in the cutoff_matrix.
variance_type: local # Calculate atomic uncertainties.
max_iterations: 20 # Maximum steps taken during each hyperparameter optimization call.
use_mapping: True # Print mapped model (ready for use in LAMMPS) during trajectory. Model is re-mapped and replaced if new DFT calls are made throughout the trajectory.

In the tutorial, we use ASE Lennard-Jones potential as ground truth

instead of DFT to save time

dft_calc:
name: Vasp
kwargs:
command: "mpirun vasp_std"
# pseudo-potential
xc: pbe
# k points
kpts: [5, 5, 4]
# INCAR
istart: 0
npar: 8
ediff: 1.0e-6
encut: 800
ismear: -5
sigma: 0.2
lreal: Auto
prec: Accurate
algo: Fast
lscalapack: False
params: {}

Set up On-the-fly training and MD

otf: # On-the-fly training and MD
mode: fresh # Start from an empty SGP
md_engine: VelocityVerlet # Define MD engine, here we use the Velocity Verlet engine from ASE. LAMMPS examples can be found in the flare/examples directory in the repo
md_kwargs: {} # Define MD kwargs
initial_velocity: 1000 # Initialize the velocities
dt: 0.001 # Set the time step in picoseconds (1 fs here)
number_of_steps: 10 # Total number of MD steps to be taken
output_name: Al_otf # Name of output
init_atoms: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] # Initial atoms to be added to the sparse set
std_tolerance_factor: -0.01 # The uncertainty threshold above which the DFT will be called
max_atoms_added: -1 # Allow for all atoms in a given frame to be added to the sparse set if uncertainties permit
train_hyps: [5,inf] # Define range in which hyperparameters will be optimized. Here, hyps are optimized at every DFT call after the 5th call.
write_model: 4 # Verbosity of model output.
update_style: threshold # Sparse set update style. Atoms above a defined "threshold" will be added using this method
update_threshold: 0.001 # Threshold for adding atoms if "update_style = threshold". Threshold represents relative uncertainty to mean atomic uncertainty, where atoms above are added to sparse set
force_only: False # Train on forces, stresses, and energies.

Error message

File "/home/USER/.local/lib/python3.10/site-packages/flare/scripts/otf_train.py", line 285, in
species_map = {flare_config.get("species")[i]: i for i in range(n_species)}
TypeError: unhashable type: 'list'

otf_train.yaml (2)
same as above, except line:
species:
- 14
- 8
Error message

File "/home/USER/.local/lib/python3.10/site-packages/flare/scripts/otf_train.py", line 233, in get_sgp_calc
assert np.allclose(np.array(d["cutoff_matrix"]).shape, (n_species, n_species)),
AssertionError: cutoff_matrix needs to be of shape (n_species, n_species)

otf_train.yaml (3)
same as above, except line:
species:
- (14, 8)
Error message

File "/home/USER/.local/lib/python3.10/site-packages/flare/bffs/sgp/sparse_gp.py", line 335, in update_db
coded_species.append(self.species_map[spec])

otf_train.yaml (4)
same as above, except line:
species:
- 14, 8
Error message
File "/home/USER/.local/lib/python3.10/site-packages/flare/bffs/sgp/sparse_gp.py", line 335, in update_db
coded_species.append(self.species_map[spec])

YuuuXie · 2023-07-04T02:43:52Z

@johnemec If you have two species, then use

species:
- 14
- 8

And in such a case, the cutoff_matrix: [[5.0]] is wrong. Instead the cutoff_matrix should be a 2x2 matrix specifying cutoffs between Si-Si, Si-O, O-Si, O-O. If you want to use the same cutoff, you can also just remove the argument cutoff_matrix

cjowen1 · 2023-07-05T16:33:42Z

@johnemec, Please also include the following, in addition to Yu's suggestions:

single_atom_energies: # total number of entries should match the number of elements considered
- 0
- 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem running otf_train.yaml (error message sparse_gp.py) #359

Problem running otf_train.yaml (error message sparse_gp.py) #359

johnemec commented Jun 30, 2023

cjowen1 commented Jun 30, 2023

johnemec commented Jul 3, 2023

YuuuXie commented Jul 4, 2023 •

edited

Loading

cjowen1 commented Jul 5, 2023 •

edited

Loading

Problem running otf_train.yaml (error message sparse_gp.py) #359

Problem running otf_train.yaml (error message sparse_gp.py) #359

Comments

johnemec commented Jun 30, 2023

Super cell is read from a file such as POSCAR, xyz, lammps-data

or any format that ASE supports

Set up FLARE calculator with (sparse) Gaussian process

In the tutorial, we use ASE Lennard-Jones potential as ground truth

instead of DFT to save time

Set up On-the-fly training and MD

cjowen1 commented Jun 30, 2023

johnemec commented Jul 3, 2023

Super cell is read from a file such as POSCAR, xyz, lammps-data

or any format that ASE supports

Set up FLARE calculator with (sparse) Gaussian process

In the tutorial, we use ASE Lennard-Jones potential as ground truth

instead of DFT to save time

Set up On-the-fly training and MD

YuuuXie commented Jul 4, 2023 • edited Loading

cjowen1 commented Jul 5, 2023 • edited Loading

YuuuXie commented Jul 4, 2023 •

edited

Loading

cjowen1 commented Jul 5, 2023 •

edited

Loading