Simplest case --- 1D linear regression

The simplest example is examples/01_simple_example in the github repository.

The README.ipynb shows how to run optzer in the directory, and how to analyze some results. Or by looking at Running_on_jupyter.ipynb, you can understand how to call use optzer on jupyter notebook not call shell commands from it.

In this example, two parameters (a and b) of a line function, f(x)=ax +b, is to be optimized to the reference data generated by g(x) = ax +b +err, where err is the Gaussian noise with a=1.0 and b=0.0. Thus if the optimized a and b are close to 1.0 and 0.0, respectively, one can say that the optimization was successful.

In the directory, there are files as the following,

01_simple_example
├── README.ipynb
├── data.ref.simple
├── in.optzer
├── in.params.simple
├── in.vars.optzer
├── makefile
├── out.optzer.REF
├── simple_func.py
├── subjob.sh
└── test.py

The files required to run optzer are in.optzer, in.vars.optzer, data.ref.simple, in.params.simple, subjob.py, and simple_func.py.

in.optzer

num_iteration       50
print_level         1

target        simple
param_files   in.params.simple

opt_method    cs
cs_num_individuals   4
cs_fraction          0.25

target – keywords of target properties that are used for data.ref.XXX and data.opt.XXX.
param_files – files containing optimizing parameters which are read by the external program, and the files should be written described below.
opt_method – optimization method.

in.vars.optzer

  2
    -0.4000    -10.00    10.00    -10.00    10.00  a
     2.0000    -10.00    10.00    -10.00    10.00  b

1st line – the number of parameters to be optimized
after 1st line – each line contains, initial guess, lower limit (soft), upper limit (soft), lower limit (hard), upper limit (hard), and name of the parameter. The soft limits are updated during the optimization, whereas the hard limit are fixed.
The name of the parameter is important and should correspond to the placeholder in the following in.params.XXX files.

in.params.simple

  {a:.3f}  {b:.3f}

In this example, simple_func.py reads parameters a and b from this file and computes f(x) = ax +b for several x points. So optzer replaces the above file with trial parameters (a and b). The format {a:.3f} indicates that the parameter a in in.vars.optzer is to be put in with .3f format. In more complicated cases, users just put {hoge:.4f}-like format onto the place in write parameter files read by an external program. Then, during the optimization, each individual of a trial parameter set generate parameter files by replacing these place-holders with trial parameters.

subjob.sh

#!/bin/bash

t0=`date +%s`
export OMP_NUM_THREADS=1

python ../simple_func.py --param-file in.params.simple

t1=`date +%s`
etime=`expr $t1 - $t0`
echo "subjob.sh took" $etime "sec, done at" `date`

This subjob.sh executes simple_func.py that loads parameters from in.params.simple and generates data.opt.simple file that is to be compared with data.ref.simple.

data.{ref,opt}.simple

  100 1.00
  -0.950  -0.994  -0.895  -0.787  -0.943  -0.922
  -0.721  -0.782  -0.885  -0.764  -0.844  -0.824
  -0.733  -0.929  -0.890  -0.753  -0.778  -0.625
  -0.727  -0.757  -0.449  -0.598  -0.549  -0.678
  -0.570  -0.484  -0.590  -0.417  -0.494  -0.443
  -0.454  -0.189  -0.355  -0.439  -0.231  -0.415
  -0.252  -0.448  -0.365  -0.192  -0.118  -0.155
  -0.163  -0.161  -0.259  -0.163  -0.117   0.055
   0.004  -0.186   0.043  -0.008  -0.017   0.132
   0.194   0.204   0.047   0.121   0.205   0.289
   0.164   0.214   0.142   0.153   0.374   0.449
   0.326   0.454   0.410   0.329   0.450   0.588
   0.451   0.631   0.233   0.597   0.544   0.526
   0.585   0.397   0.594   0.672   0.804   0.625
   0.616   0.667   0.829   0.790   0.725   0.849
   0.828   0.935   0.788   0.846   0.860   0.773
   0.969   0.986   0.980   0.977

1st line – the number of data and weight for the data.
after 1st line – data (6 entries per line)

This file format is the basic one, but there are some other choices for the data format with option datatype.

Results

The optzer program writes db.optzer.json file in addition to standard output. The db.optzer.json file contains all the trial parameters and corresponding ID and loss values. One can read it and obtain the best candidate from it, or evolution of loss functions or parameters from it. If you want to read it using pandas package, the following code will show all the individuals computed by optzer.

import pandas as pd
db = pd.read_json('db.optzer.json', orient='records', lines=True)
print(db)

Or you can use a script for converting the JSON file to a CSV file as,

$ python /path/to/optzer/optzer/db2csv.py db.optzer.json out.csv

[NOTE] The optzer first look for db.optzer.json and read the previously computed trials as prior knowledge of the estimation of parameter search range. So if you do not want to use the former search result and start from the beginning, do not forget removing db.optzer.json in the working directory.

The below figure shows the evolution of loss function values as a function of generation. The minimum loss function decreases as the generation proceeds.

Losses

The optimized line function is shown below. The line well represents the reference points.

y-x

And the figure below shows that, as the loss decreases, the optimized parameters a and b becomes close to the optimal values 1.0 and 0.0, respectively.

loss-params