fitpot -- fit parameters of neural-network potential¶
The validity of MD simulations depends strongly on the accuracy of the interatomic potential used in the simulation. So, when you think of doing some simulation of specific system, you have to prepare an interatomic potential that reproduces the phenomena you are thinking of.
Here we indroduce how to make a neural-network potential and fit potential parameters with the fitpot program included in nap package.
Note
Currently, the fitpot program is used only for neural-network and uf3 potential. For other classical potentials, use optzer instead.
What does the fitpot do?¶
In the fitpot, the following loss function is minimized by optimizing potential parameters \(\{ \beta \}\). $$ \mathcal{L}({\beta}) = \frac{w_E}{\sigma^2_E N_s}\sum_s^{N_s} \Delta E^2 +\frac{w_F}{3\sigma^2_F N_s N^{(s)}}\sum_s^{N_s}\sum_i^{N}\left| \Delta \boldsymbol{F}_i\right|^2 +\frac{w_S}{6 \sigma^2_S N_s}\sum_s^{N_s}\left| \Delta \sigma \right|^2 $$ in the case of fitting energies and forces. Here, \(s\) is the sample number, \(N_s\) the number of samples, \(N^{(s)}\) the number of atoms in the sample \(s\). \(w_X\) are the weights on term-\(X\), where \(w_E +w_F +w_S = 1\). \(\sigma_X\) are the standard deviations of reference data. Dividing by \(\sigma^2_X\) makes it possible to compare quantities of different units.
To minimize the above loss function, the following gradient-based methods are available in fitpot:
- Steepest descent (SD)
- Quasi-Newton method (BFGS)
- Stochastic gradient descent (SGD)
How to compile¶
Since some modules in pmd program are required for the compilation of fitpot, compile pmd before compiling fitpot. :
$ cd /path/to/nap/
$ ./configure --prefix=$(pwd)
$ cd pmd
$ make pmd lib #<-- lib must be made in addition to pmd
$ cd ../fitpot/
$ make fitpot
Quick trial with an example¶
There is an example of fitpot with minimal dataset to see how it
works. Go to the directory examples/fitpot_DNN_SiO/
, read README.md
,
try running fitpot, and look at some output files.
Fitting procedure¶
Hereafter, we assume that the reference data are obtained by using an ab-initio calculation program, VASP.
Potential parameters are fitted as the following procedure:
Prepare reference data¶
Assuming that there are some reference data in dataset/
directory, and
each sample (structure) is written in a single file smpl_XXX
in pmd-format
that contains energy, forces, and stress information.
The pmd-format is as follows.
#
# specorder: Li P S
# energy: -437.76069
# stress: 0.35400 0.20100 1.01100 0.16400 0.30000 0.06100
# auxiliary_data: fx fy fz
#
1.000
13.34800 0.00000 0.00000 0.00 0.00 0.00
0.00000 15.72500 0.00000 0.00 0.00 0.00
0.00000 0.00000 12.22300 0.00 0.00 0.00
128
1.10000000000001 1.41272100689242e-01 2.58003179650238e-01 3.54880143990837e-01 0.00 0.00 0.00 -0.0292 -0.1009 -0.1176
1.10000000000002 8.28176505843572e-01 2.57068362480127e-01 1.49660476151518e-01 0.00 0.00 0.00 -0.1132 -0.3853 0.3531
...
- energy, stress, and auxiliary_data (fx,fy,fz) should be written in the option.
- force components on each atom should be added followed by velocity information at each atom entry line.
If you extract DFT data from ab-initio MD runs with VASP,
positions, energy, forces and stress of each MD step can be obtained
from vasprun.xml
file as follows, :
$ cd /path/to/dir/that/includes/vasprun.xml/
$ python path/to/nap/nappy/vasp/vasprun2fp.py
Change /path/to/dir/that/includes/vasprun.xml/
part according to your situation.
Then you get smpl_XXX
files in the dataset
directory.
If you want to extract several sample data from one vasprun.xml
which is the case for MD simulation or structure relaxation, add --sequence
option to vasprun2fp.py
.
Then you get files with names ####
where #
is some digit such as 00010
.
You may have to specify species order by --specorder=Li,P,S
or something like that.
For more detail, see the help by vasprun2fp -h
.
Prepare input files¶
The following files are needed for fitpot:
in.fitpot
in.vars.fitpot
-- includes intial values and ranges of parameters to be optimized.
In some cases, some additional files are required,
in.params.DNN
-- see DNN force for detailsin.params.desc
-- see DNN force for detailsin.params.Coulomb
in eachsmpl_XXX
directory in some special cases
You have to specify the num_samples
in in.fitpot
file which is the
number of samples in dataset/
directory.
The number of sample files (smpl_
) can be counted by the following command,
$ ls /path/to/dataset | grep smpl_ -c
Run fitpot program¶
In the directory where dataset/
directory and in.fitpot
file exist,
you can run the fitpot program as, :
$ ~/src/nap/fitpot/fitpot > out.fitpot 2>&1 | tee out.fitpot
Or if you want it to run in parallel mode, :
$ mpirun -np 10 ~/src/nap/fitpot/fitpot > out.fitpot 2>&1 | tee out.fitpot
There are some output files:
out.erg.trn.fin
,out.erg.tst.fin
-- These files include reference and pmd data of energies. To see whether the fitting went well or not, plot these data by usinggnuplot
as,shell $ gnuplot gnuplot> plot 'out.erg.trn.fin' us 1:2 w p t 'training set' gnuplot> rep 'out.erg.tst.fin' us 1:2 w p t 'test set'
out.frc.trn.fin
,out.frc.tst.fin
-- These files include reference and pmd data of forces.out.strs.trn.fin
,out.strs.tst.fin
-- These files include reference and pmd data of stresses.
Input file for fitpot¶
The following code shows an example of the input file in.fitpot
.
num_samples 14
num_iteration 100
num_iter_eval 1
converge_num 3
test_ratio 0.1
fitting_method bfgs
sample_directory "./dataset/"
param_file in.vars.fitpot
normalize_input none
init_params read
energy_match T
force_match T
stress_match T
potential DNN
# Weights for energy, force, stress
weights 0.5 0.5 0.5
ftol 1.0e-5
xtol 1.0e-4
penalty none
penalty_weight 1d-3
# Species order: 1) Al, 2) Mg, 3) Si
specorder Al Mg Si
num_samples¶
Default: none
Number of reference samples to be used for training and test.
sample_list¶
Default: none
Path to the file that contains a list of samples to be used for training and test. The format of the list file should be like, :
smpl_001
smpl_002
smpl_003
...
or with specifying which samples are training (1
) or test (2
) as, :
smpl_001 1
smpl_002 2
smpl_003 1
...
If whether training or test is specified in the list, test_ratio will be neglected.
test_ratio¶
Default: 0.1
The ratio of test data set \(r\) within whole data set \(N_s\). Thus the number of test data set is \(rN_s\), and the number of training data set is \((1-r)N_s\).
num_iteration¶
Default: 1
Number of iterations of a minimization method.
num_iter_eval¶
Default: 1
Test data set will be evaluated every num_iter_eval iterations.
fitting_method¶
Default: test
The method used to fit parameters to the sample data. Available methods are the following:
sd
/SD
-- Steepest descent algorithm which requires gradient information.cg
/CG
-- Conjugate gradient algorithm which requires gradient information.bfgs
/BFGS
-- Quasi-Newton method with BFGS. This requires gradient information.sgd
/SGD
-- Stochastic gradient descent method. This computes loss and gradient information using onlybatchsize_per_node
samples at a time, and update parameters using the information, instead of computing all the samples each time for updating parameters. It is considered to be better in the case where there are many samples and optimizing parameters.check_grad
-- Comparison of analytical derivative and numerical derivative. Use this to check the implemented analytical gradient.test
/TEST
-- Just calculate function L and gradient of L w.r.t. fitting parameters.
sample_directory¶
Default: dataset
The directory that includes sample data. We call this dataset
in the
above instruction.
If you want to use ..
to specify the directory relative to the current
working directory, e.g. ../dataset
, you need to enclose with
double-quotation marks like "../dataset"
.
param_file¶
Default: in.vars.fitpot
The name of the file that has parameter values in it. This is passed to
pmd
program.
ftol¶
Default: 1.0e-6
The tolerance of difference of the loss function value.
xtol¶
Default: 1.0e-4
The tolerance of the change of variables which are optimized. If either one of [ftol]{.title-ref} or [xtol]{.title-ref} is achieved, the optimization stops.
energy_match, force_match, stress_match¶
Default: True for energy, False for force and stress
Whether or not to match forces. ( True or False ) It is recommended to match not only energy but also forces, since forces are important for molecular dynamics.
potential or force_field¶
Default: DNN
The potential whose parameters you are going to fit. Potentials currently available are:
DNN
-- Neural-network potentialuf3
-- Ultra-Fast Force-Fieldlinreg
-- Linear regression potential
weights¶
Default: 0.5 0.5 0.5
Weights for energy, force, and stress terms, \(w_E, w_F, w_S\). They are normalized in the program so that they satisfy \(w_E+w_F+w_S = 1\).
max_num_neighbors¶
Default: 50
The maximum number of neighbors among all the samples. If you encountered an error like the following, increase this value.
[Error] nnl.gt.nnmax
myid,nnl,nnmax = 0 73 72
random_seed¶
Default: 12345d0
Initial random seed for the uniform random numbers used in the fitpot. This is used to change the random choice of training and test sets.
regularize¶
Whether or not regularize bases obtained in linreg and DNN potentials. ( True or False )
Default: False
penalty¶
Default: no
Type of penalty term, lasso which is L1-norm penalty or ridge which is L2-norm penalty, or no which means no penalty term.
penalty_weight¶
Default: 1.0
The weight applied to the penalty term. This value also has to be determined through cross-validation scoring...
sample_weight¶
Default: 0
The number of samples whose weights are to be given.
There must be the same number of following entry lines as the above value which determine the weights of specified samples like the following:
sample_weight 2
Al_fcc 2.0
Al_bcc 0.5
The each entry has entry_name and weight.
The weight values are applied to all the samples that contain entry_name
in their file names (e.g., XXX
in smpl_XXX_YYY_####
).
force_denom_type¶
relative
or absolute
Default: relative
Which type of denominator of force term in the loss function is used. If
absolute
is specified, the fitpot uses an error of forces
specified in the sample_error for the
denominator of force term. If relative
is specified, the fitpot uses
a magnitude of force on the atom in the denominator of force term.
specorder¶
Default: none
The order of species common in fitpot. This must be specified before
atom_energy
entry and must hold for every samples.
init_params¶
Default: read
Whether the paramters to be optimized are read from the file or initialized.
read
-- Read parameters from the file.gaussian
-- Parameters are initialized with Gaussian distribution according to init_params_sgm and init_params_mu.
init_params_sgm¶
Default: 1d0
Variance of Gaussian distribution of the initial values for parameters.
init_params_mu¶
Default: 0d0
Mean value of Gaussian distribution of the initial values for parameters.
init_params_rs¶
Default: 12345.0
Random seed for the initialization of parameters. This random seed is
only used for this purpose and does not affect random seed for the
choice of training and test sets, which is affected by random_seed
.
sgd_rate_ini¶
Default: 0.001
Initial or constant learning rate used in SGD.
sgd_rate_fin¶
Default: -0.001
If this value is positive, the learning rate in SGD linearly changes as iteration goes. If this is negative, contant learning rate is used.
batchsize_per_node¶
Default: 1
Number of samples per node in a batch process of evaluation of loss function. Thus if the fitpot is performed in parallel, the number of batch samples becomes multiplied by the number of parallel nodes.