-
As the title suggests, when I am in the training phase, especially when training multiple types of atoms, what should I use as a reference value for setting gauss_stop? For example, now I am training SiC, and setting gauss_stop for C-Si between 5~7 can achieve similar loss, but the lowest is only 2.2e-5, which is not accurate enough to obtain correct results. How can I improve this? [basic] [graph] [train] [hyperparameter] [network] |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
Are you using the SiC alloy DFT dataset to train the DeepH model? Please make sure that all DFT calculations have converged before training. If the original DeepH model still cannot achieve a satisfactory loss, you can try training your system with the updated DeepH-E3:
The file format interface for DeepH-E3 is the same as for DeepH-pack, so you can still use the dataset in [basic]
; device string Device on which model will be trained (cpu or cuda)
; dtype string Data type of floating point numbers used during training (float or double)
; save_dir string Directory under which training result will be saved
; additional_folder_name string The folder containing train result will be named
; "yyyy-mm-dd_hh-mm-ss_<additional_folder_name>"
; simplified_output boolean If set to True, detailed loss of each single target will not be printed to stdout,
; but you can still find them in tensorboard
; seed int Random seed to be used during training
; checkpoint_dir string Path pointing to model.pkl or best_model.pkl under the directory where
; result of some previous training is saved.
; All settings in sections [hyperparameters], [target], [network] will be overwritten by the config in checkpoint.
; Leave empty to start new training.
device = cuda:0
dtype = float
save_dir = /change/me/1
additional_folder_name =
simplified_output = True
seed = 42
checkpoint_dir =
[data]
; There are three methods to load E3DeepH data.
; 1. Fill in graph_dir and leave all other parameters blank.
; An existing graph will be loaded.
; 2. Fill in processed_data_dir, save_graph_dir, dataset_name.
; A new graph will be created from preprocessed data under processed_data_dir and saved under save_graph_dir.
; This graph will be readily loaded.
; 3. Fill in DFT_data_dir, processed_data_dir, save_graph_dir, dataset_name.
; First DFT data will be preprocessed and saved under processed_data_dir.
; Then a new graph will be created using those preprocessed data, and saved under save_graph_dir.
; Finally this new graph will be loaded.
; graph_dir string Directory of preprocessed graph data xxxx.pkl
; processed_data_dir string Directory containing preprocessed structure data. Should contain elements.dat, info.json,
; lat.dat, orbital_types.dat, rlat.dat, site_positions.dat and hamiltonians.h5
; DFT_data_dir string Directory containing DFT calculated structure folders. Each structure folder should contain
; openmx.scfout with openmx.out concatenated to its end.
; save_graph_dir string Directory for saving graph data (method 2, 3).
; target_data string Only support 'hamiltonian' now
; dataset_name string Custom name for your dataset
graph_dir =
DFT_data_dir =
processed_data_dir = /home/zjlin/work/DeepH-pack_work/SiC/work_dir/dataset/processed
save_graph_dir = /change/me/2
target_data = hamiltonian
dataset_name = SiC_soc
[train]
; num_epoch int Maximum number of training epochs
; batch_size int Batch size
; extra_validation string
; train_ratio float Ratio of structures among all that will be used for training
; val_ratio float Ratio of strucutres among all that will be used for validation
; test_ratio float (test set not implemented yet)
; train_size int Overrides train_ratio if a positive integer is provided
; val_size int Overrides val_ratio if a positive integer is provided
; test_size int Overrides test_ratio if a non-negative integer is provided
; min_lr float When learning rate decays lower than min_lr, training will be stopped.
; set to -1 to disable this.
num_epoch = 2000
batch_size = 1
extra_validation = []
train_ratio = 0.6
val_ratio = 0.2
test_ratio = 0.2
train_size = -1
val_size = -1
test_size = -1
min_lr = 1e-4
[hyperparameters]
; learning_rate float Initial learning rate
; Adam_betas string Will be pased as a two-element tuple as betas used by Adam optimizer
; scheduler_type int 0 - no scheduler;
; 1 - ReduceLrOnPlateau https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html
; 2 - Slippery slope scheduler: for example, (start=1400, interval=200,
; decay_rate=0.5) will decay LR at step 1400 by 0.5 and then decay by 0.5
; every 200 steps.
; scheduler_params string Will be parsed as a python dict object and passed as keyword arguments
; to ReduceLROnPlateau or SlipSlopLR.
;
; revert_decay_patience int
; revert_decay_rate float
; Sometimes loss will suddenly go up during training and decreases very slowly.
; When validation loss has been more than 2 times of best loss for more than
; <revert_decay_patience> epochs, the model will be reverted to the heretofore best model
; and learning rate will decay by a factor of <revert_decay_rate>.
learning_rate = 0.003
Adam_betas = (0.9, 0.999)
scheduler_type = 2
scheduler_params = (start=700, interval=300, decay_rate=0.5)
revert_decay_patience = 20
revert_decay_rate = 0.8
[target]
; target string Only hamiltonian is supported now
; target_blocks_type string choices: (all, diag, off-diag, specify)
; all: train all matrix blocks of hopping in one model
; diag: only train diagonal blocks
; off-diag: only train off-diagonal blocks
; specify: specify the matrix blocks to be trained by hand
; target_blocks string This will only take effect when target_blocks_type=specify.
; See explanations at the end of this config
; selected_element_pairs string Train only on hoppings between element pairs specified here.
; Will have no effect if target_blocks_type=specify.
; example: ['42 42', '16 16']
; Under this example, only hoppings between Mo-Mo and S-S will be trained.
; convert_net_out boolean Please set to False. Option True is still under development.
target = hamiltonian
target_blocks_type = all
target_blocks =
selected_element_pairs =
convert_net_out = False
[network]
; cutoff_radius float Cutoff radius of Gaussian basis for edge-length encoding, in Angstrom
; only_ij boolean Please set to False. Option True is still under development.
; spherical_harmonics_lmax int Maximum angular momentum quantum number used in spherical harmonics. Cannot be
; used simultaneously with spherical_basis_irreps.
; spherical_basis_irreps string Irreps used for spherical basis function. Cannot be used simultaneously with
; spherical_harmonics_lmax.
; irreps_embed string Irreps used for node- and edge-embedding, should only contain 0e
; irreps_mid string Irreps of edge and node features in intermediate layers
; num_blocks string Number of message passing blocks
cutoff_radius = 7.0
only_ij = False
spherical_harmonics_lmax = 5
spherical_basis_irreps =
irreps_embed = 64x0e
irreps_mid = 64x0e+32x1o+16x1e+16x2e+8x3o+8x4e+4x5o
num_blocks = 3
ignore_parity = False
; Below are more advanced settings of the irreps used in the network.
; Usually these can simply be left blank, we will automatically generate the appropriate settings for you.
irreps_embed_node =
irreps_edge_init =
irreps_mid_node =
irreps_post_node =
irreps_out_node =
irreps_mid_edge =
; The best irreps for below will be automatically generated.
; Adjusting according to your own will might cause errors.
irreps_post_edge =
out_irreps =
; =============================
;
; Explanation of target_blocks
;
; For example, the compound MoS2 has two types of elements: Mo(42) and S(16). The orbital types of Mo and S are [0, 0, 0, 1, 1, 2, 2] and [0, 0, 1, 1, 2] respectively (this can be found in orbital_types.dat in processed structure folder). This means, The number of atomic orbitals for Molybdenum is 7, containing three S orbitals, two P orbitals and two D orbitals. Similar for the element Sulphur.
;
; Suppse we set target_blocks to
; [{"42 42": [3, 5]}]
; This means, when the net sees a hopping matrix between Mo and Mo, it only takes out the hopping between the orbital of Mo which has index 3 (i.e. the first P orbital) and the orbital of Mo which has index 5 (i.e. the first D orbital). The predicted matrix size is thus (2x1+1)x(2x2+1) = 3x5. Other types of hopping (e.g. Mo-S, S-Mo, S-S) are not trained.
;
; If the target is set to be
; [{"42 42": [3, 5], "42 16": [3, 4], "16 42": [2, 5], "16 16": [2, 4]}]
; Then 4 types of hopping are trained together. Specifically, these are: hopping from 1st P orbital of Mo to 1st D orbital of Mo, 1st P orbital of Mo to 1st D orbital of S, 1st P orbital of S to 1st D orbital of Mo, 1st P orbital of S to 1st orbital of S. These orbitals will be predicted in the same output channel.
;
; If the target is set to be
; [{"42 42": [3, 5], "42 16": [3, 4], "16 42": [2, 5], "16 16": [2, 4]}, {"42 16": [3, 2]}]
; In addition to the orbitals described above, the new dict in the list {"42 16": [3, 2]} introduces a new independent channel in the output. This channel predicts the hopping from the 1st P orbital of Mo to the 1st P orbital of S.
;
; Note that the angular quantum numbers should always be the same for orbitals predicted in the same channel, or error will be thrown out. |
Beta Was this translation helpful? Give feedback.
Are you using the SiC alloy DFT dataset to train the DeepH model? Please make sure that all DFT calculations have converged before training.
If the original DeepH model still cannot achieve a satisfactory loss, you can try training your system with the updated DeepH-E3:
(accepted by Nature Communications)
The file format interface for DeepH-E3 is the same as for DeepH-pack, so you can still use the dataset in
/home/zjlin/work/DeepH-pack_work/SiC/work_dir/dataset/processed
and runpython deephe3-train.py train.ini
. The …