Simplest case --- 1D linear regression
The simplest example is examples/01_simple_example
in the github repository.
The README.ipynb
shows how to run optzer in the directory, and how to analyze some results.
Or by looking at Running_on_jupyter.ipynb
, you can understand how to call use optzer on jupyter notebook not call shell commands from it.
In this example, two parameters (a and b) of a line function, f(x)=ax +b, is to be optimized to the reference data generated by g(x) = ax +b +err, where err is the Gaussian noise with a=1.0 and b=0.0. Thus if the optimized a and b are close to 1.0 and 0.0, respectively, one can say that the optimization was successful.
In the directory, there are files as the following,
01_simple_example
├── README.ipynb
├── data.ref.simple
├── in.optzer
├── in.params.simple
├── in.vars.optzer
├── makefile
├── out.optzer.REF
├── simple_func.py
├── subjob.sh
└── test.py
The files required to run optzer are in.optzer
, in.vars.optzer
, data.ref.simple
, in.params.simple
, subjob.py
, and simple_func.py
.
num_iteration 50
print_level 1
target simple
param_files in.params.simple
opt_method cs
cs_num_individuals 4
cs_fraction 0.25
target
– keywords of target properties that are used fordata.ref.XXX
anddata.opt.XXX
.param_files
– files containing optimizing parameters which are read by the external program, and the files should be written described below.opt_method
– optimization method.
2
-0.4000 -10.00 10.00 -10.00 10.00 a
2.0000 -10.00 10.00 -10.00 10.00 b
- 1st line – the number of parameters to be optimized
- after 1st line – each line contains, initial guess, lower limit (soft), upper limit (soft), lower limit (hard), upper limit (hard), and name of the parameter. The soft limits are updated during the optimization, whereas the hard limit are fixed.
- The name of the parameter is important and should correspond to the placeholder in the following
in.params.XXX
files.
{a:.3f} {b:.3f}
In this example, simple_func.py
reads parameters a and b from this file and computes f(x) = ax +b for several x points.
So optzer replaces the above file with trial parameters (a and b).
The format {a:.3f}
indicates that the parameter a in in.vars.optzer
is to be put in with .3f
format.
In more complicated cases, users just put {hoge:.4f}
-like format onto the place in write parameter files read by an external program.
Then, during the optimization, each individual of a trial parameter set generate parameter files by replacing these place-holders with trial parameters.
#!/bin/bash
t0=`date +%s`
export OMP_NUM_THREADS=1
python ../simple_func.py --param-file in.params.simple
t1=`date +%s`
etime=`expr $t1 - $t0`
echo "subjob.sh took" $etime "sec, done at" `date`
This subjob.sh
executes simple_func.py
that loads parameters from in.params.simple
and generates data.opt.simple
file that is to be compared with data.ref.simple
.
100 1.00
-0.950 -0.994 -0.895 -0.787 -0.943 -0.922
-0.721 -0.782 -0.885 -0.764 -0.844 -0.824
-0.733 -0.929 -0.890 -0.753 -0.778 -0.625
-0.727 -0.757 -0.449 -0.598 -0.549 -0.678
-0.570 -0.484 -0.590 -0.417 -0.494 -0.443
-0.454 -0.189 -0.355 -0.439 -0.231 -0.415
-0.252 -0.448 -0.365 -0.192 -0.118 -0.155
-0.163 -0.161 -0.259 -0.163 -0.117 0.055
0.004 -0.186 0.043 -0.008 -0.017 0.132
0.194 0.204 0.047 0.121 0.205 0.289
0.164 0.214 0.142 0.153 0.374 0.449
0.326 0.454 0.410 0.329 0.450 0.588
0.451 0.631 0.233 0.597 0.544 0.526
0.585 0.397 0.594 0.672 0.804 0.625
0.616 0.667 0.829 0.790 0.725 0.849
0.828 0.935 0.788 0.846 0.860 0.773
0.969 0.986 0.980 0.977
- 1st line – the number of data and weight for the data.
- after 1st line – data (6 entries per line)
This file format is the basic one, but there are some other choices for the data format with option datatype
.
The optzer program writes db.optzer.json
file in addition to standard output. The db.optzer.json
file contains all the trial parameters and corresponding ID and loss values.
One can read it and obtain the best candidate from it, or evolution of loss functions or parameters from it. If you want to read it using pandas package, the following code will show all the individuals computed by optzer.
import pandas as pd
db = pd.read_json('db.optzer.json', orient='records', lines=True)
print(db)
Or you can use a script for converting the JSON file to a CSV file as,
$ python /path/to/optzer/optzer/db2csv.py db.optzer.json out.csv
[NOTE]
The optzer first look for db.optzer.json
and read the previously computed trials as prior knowledge of the estimation of parameter search range.
So if you do not want to use the former search result and start from the beginning, do not forget removing db.optzer.json
in the working directory.
The below figure shows the evolution of loss function values as a function of generation. The minimum loss function decreases as the generation proceeds.
The optimized line function is shown below. The line well represents the reference points.
And the figure below shows that, as the loss decreases, the optimized parameters a and b becomes close to the optimal values 1.0 and 0.0, respectively.