User's Manual

Patrick Cloutier, Cristian Tibirna , Bernard Grandjean**
**et Jules Thibault

Département de génie chimique
, Université Laval

Sainte-Foy (Québec) CANADA G1K 7P4

Contents

- 1. Introduction and disclaimer
- 2. Equations of
the model
- 3. The problem
of "overfitting"
- 4. Preparing
the data
- 5. Using
the program
- 6.
Information
on the files generated by NNFit
- 6.1. namexx.ite
- 6.2.
namexx.pra, or . prg
or .pre or .vyy
- 6.3. namexx.w
- 6.4. namexx.syy
- 6.5. name.cfg
- 6.6. Hidden files
- 7.
Using
a model out of NNFit
- 8. Installing
NNFit
- 9. Comments
- 10. Acknowledgements
and credits

The
program NNFit (**N**eural** N**etwork based
data **Fit**ting) allows the development of
empirical non-linear correlations, by using artificial neural network models:
the multilayered Perceptron. NNFit is a non-linear
regression software allowing to find relationships between a set of input
variables, X_{i} (1 __<__ i __<__
I), and a set of output variables Y_{k}_{ }(1
__<__ k __<__ K) given a set of N relevant experimental data, [ X_{i}
,Y_{k}_{ }]_{n} (1 __<__
n __<__ N). The user of NNFit, as any other
empirical modeling approach, must keep in mind that the quality of the
regression models Y=f(X) he may obtain, will depend on
the relevance and the quality of the available experimental data. In addition,
it is important to stress that the success in getting a regression model that
fits well a given set of data, is absolutely not a guarantee that such a model
will have generalization capability. It means there is no guarantee the model
will predict correctly a new set of data: [ X_{i}
,Y_{k}_{ }]_{m} (N+1 __<__
m __<__ P).

** The authors and the Laval University could not be kept responsible for
the use of the models developped by using the NNFit program**.

The reader is refered to literature for an exhaustive presentation of the
neural network models (On Internet -> ftp://ftp.sas.com/pub/neural/FAQ.html).

Basically, the user which
is not familiar with neural network paradigm, should consider here a neural
model as **a non-linear regression** model that gives a relationship between
a normalized input vector U and a normalized output vector S. The
transformation S=f(U) is represented by a multilayered
neural network with a single hidden layer as illustrated below:

Figure
1. Schematic representation of a neural
network with layers"

The
model uses, beside the variables of the problem, a constant input, named the
bias, and equal to 1 which is imposed both for the input and hidden layers.
Model equations are given below.

For the neurons of the input layer:

U_{i}_{:} vector of normalized input
variables, U_{I+1 }=1

H_{j}: output of neuron, j, from the hidden
layer, H_{J+1}=1

For the neurons of the output layer

f: sigmoid function: f(z)=1/(1+e^{-z})

S_{k}_{:} vector of normalized output
variables

The transformation of actual variables (X,Y) to normalized variables (U,S) is
given by:

*Remark*: If variable X_{i} (or Y_{k})
covers many decades, the use of log_{10} X_{i} (or log_{10}
Y_{k}) could be prefered
and then, the normalization is written as below:

As all regression models, the neural model contains many fitting parameters:

- the value of J, the number of nodes in the hidden layer

- the values of the parameters W_{ij} et W_{jk}, known as the wheights

J being varied and choosen empirically , the model
contains then [(I+1).J + (J+1).K] fitting parametres
which are determined by regression on a pertinent set of N pairs of
experimental data.

The method consists in minimizing a quadratic criterium
(sum of the squares of the prediction errors, absolute errors, Q_{a}, or relative errors, Q_{r})
by using as minimization algorithm a quasi-Newton method of type BFGS
["Numerical Recipes, The art of Scientific Computing" Press W.H. et
al. Cambridge University Press 1986.]. This step of minimization of the
quadratic criterium, which allows to fit the model on
a set of data, is known as *"learning"* and the set of data used
at this step is thus called the *"learning file"*.

The quadratic criteria Q_{a} or Q_{r} to be minimized, refer to absolute or
relative errors and are defined as it follows:

**Warning**: If one uses the relative criterium,
none of the values of S_{k}^{exp}(n) may be equal to 0.As the normalization is done on S_{k} variables in the range [YMIN_{k}
- YMAX_{k}], the user must then chose an YMIN_{k} value slightly smaller than the minimum Y_{k} value observed in the file. Thus, the
normalization will result in S_{k}
values striclty positive.

Overfitting of neural models is a well known problem.
Because of their great plasticity, neural models may predict with a great
accuracy the set of data on which the fitting has been done whereas some large
prediction errors may be observed when the model is tested on a new data set .

To easier the understanding of overfitting, let's consider the variation Y=f(X)
presented in the figure. One disposes of a limited number of experimental
measurements. Let's suppose that a phenomenological model based on first
principles exists as presented in the figure. Overfitting corresponds to the
situation where the neural model is able to predict accurately the available
experimental data, whereas it is completely in error when apply in the
intervals between the learning data. Obviously, such a model is not reliable
and its uses is not recommanded.

One approach for highlighting the overfitting consists in splitting the initial
data file in two parts. The first part represents the *learning file* on
which the minimization is performed. The other part (*generalization file*)
will be used for testing the generalization capability of the model . Recall that the minimization algorithm works by an
iterative process in which different values of the wheights
(W_{ij} and W_{jk} ) are
explored in order to minimize the quadratic criterium.
It is then possible to verify the predictions of the model on the
generalization file at each iteration of the optimization algorithm.

The previous figure represents schematically the variation of the sum of
squares of the errors on the learning file and on the generalization file
depending of the number of iterations. The raise of this sum on the generalisation file is an indicator that overfitting
occurs. In order to avoid this problem, it will be desirable to stop the
minimization routine after only N1 iterations instead of letting the algorithm
to get convergence after N2 iterations.

This procedure of stopping the minimization before convergence is called the
early stopping method and is proposed in NNFit.

The available ** N experimental data** must be in the form of a file
containing

Warning

N.B.

Run NNFit and the following menu bar will be
displayed:

**5.1 ****Start**

**New** - for solving a new problem: it generates
the configuration file (**name.cfg**) of the
problem and allows to run optimization.

**Open**- for loading an existing configuration file and runing
optimization once again

**About** - for displaying the names and the addresses of the authors

Click on **New** and choose the data file on which the modeling is to be run
*(example: demo.dat)*.

A working sheet appears and all the information required to develop the model
must be specified.

In the superior frame the *Information on the data file* is appearing *(demo.dat,
441 lines et 4 columns)*

The second frame, titled *File partition for generalization*, allows the
user, if needed, to split the initial data file in two parts, in order to
identify possibly the overfitting problems. In order to do this partition, the
number of data pairs retained for generalization must be specifed,
by chosing the % or the number of line to take from
the initial file. Two partition methods are proposed:

- random method (by using this method the number of data lines retained in the
newly created files will be near but not necessarely
identical to the required values)

- continuous segment method (use the sliding cursor or the input boxes to
indicate the size of the segment, its beginning and its end).

Once the required information are conveniently indicated, click on the **Partition**
button for creating two new files. These learning and generalization files are
identified respectively by their extensions: **.axx**
and **.gxx** (xx may vary between 01 and 99); the
root of their names is the one of the initial file (example: demo.dat will be splitted in *demo.a01 and demo.g01*).

In the third frame, titled *Network dimensions*, the information
specifying the number of input variables (*without bias!*), the number of
output variables and the number J of hidden nodes (*without bias!)* must
be entered here. The number of hidden nodes may vary between choosen minimum and maximum values. Recall that a family of
(J_{ max }- J_{min} +1) models will
be created and the best model will be selected later by using the **Simulations
comparison** facility offered in the **Other**
menu. Click on **Validation** for recording this information.

*( Example**: for the file demo.dat, the
problem comprises 2 inputs, 2outputs and a variation of J between 1 and 6 is
proposed)
*

In the lower frame,

The button

- choice of the quadratic criterium to minimize: sum of squares of the relative or absolute errors

- choice of the maximum number of iterations in the optimization routine and of the convergence criterium

- choice of initial wheights. The option

Once all the information is provided in the configuration sheet, click

This section of the program allows to view the calculus results.

After selecting

After choosing the

The results display contains a graph and a table and these may be directly printed on paper or in a file (for later printing).

The table having the title

- Prediction errors, median, minimum, maximum, standard deviation. It is possible to use the absolute error, the relative error or the absolute value of the relative error:

- correlation and determination coefficients defined below (it is preferable that they have values near to 1!):

The frame

The display in the window called

**Predicted vs Experimental :**if a very good fit is obtained, the dots representing the data are aligned along the first diagonal.**Predicted/Experimental vs n**: this graph allows to overlap the following two curves:- (calculated
value) versus (line number of the data in the file)
*(with a continuous line)* - (experimental
value) versus (line number of the data in the file)
*(symbol)*

When the model has multiple outputs, use the buttons **Previous** or **Next**
to display the results of the outputs k+1 or k-1.

**5.3 ****Use**

This menu allows to use the models built in previous simulations. For each of
the options listed here, the user is invited to choose the wheights
file, **namexx.w**, which caracterises
the model to test.

- **Validate on file** permits to test the model on a new inputs/outputs
data set of the problem.

By default, this new file is supposed to have a structure identical to the one
indicated in the configuration file (if available) associated with the wheights file. If not, the user must proceed to the
selection of the inputs/outputs (as previously described in section 5.1). Click on the button **Start**
for launching the calculus and the results will be stored in the file **namexx.vyy** (yy goes from
01 to 99, a new number being affected automatically for each new test using the
same wheights file, namexx.w).
The files *.vyy may be visualized using option **Validation**
of the **View** menu (see section 5.2).

- **Simulate on file** corresponds to the case when only the inputs are
known and the outputs are to be predicted. The results are written in the file **namexx.syy**.

-** Simulate on inputs** allows to use a model for a singular simulation in
which data entering is perfomed on the screen.

**5.4 ****Process**

This menu allows to identify the simulations running at a moment and eventually
to destroy them.

**Remark: **After an abnormal interruption of a simulation, its PID number
will still be displayed in this dialog, even if the process does not exist
anymore.

**5.5 ****Other**

**Correlation matrix** allows to calculate the correlation coefficients,
between the columns of a given data file. Between the columns p and q, the
correlation coefficients a_{pq} are defined
as:

**Simulations comparaison** allows to quantify the
effect of the number of hidden nodes (J) on the fitting of the models over the
data files (learning or validation). The user selects the name of the
configuration file (***.cfg**) and the following
table is displayed.

This window puts together, for all the explored values of J (J=number of hidden
nodes in the hidden layer), the values of the mean errors, of the standard
deviation of the errors and the correlation coefficient for each output (K),
and also for the learning and generalization files.

This table might ease the choice of the best model to retain.

**Preferences**:

This option allows to select the default values of cvarious
modeling parameters.

**6.1 namexx.ite**: (iterations file)

- column 1: iteration
number
- column
2: quadratic criterium to minimize (Qr or Qa) = sum of the
squares of the errors for all the outputs 1 to k

for k=1 to K: - columns
2+k: sum of the squares of the errors on the output k

**6.2 namexx.pra, ou . prg ou .pre
ou .vyy:** (prediction
file when experimental
outputs are available).

- column
1: number of the data

for k=1 to K: - column
2+2(k-1) experimental value of the output k
- column
3+2(k-1) calculated value of the output k

**6.3 namexx.w**(wheights file)

- line
1: I,J,K

for i=1 to I - line
1+i UMAX
_{i}, UMIN_{i}, reading mode normal or log of the input

for k=1 to K - line
(1+I)+k SMAX
_{i}, SMIN_{i}, reading mode normal or log of the output

for i=1 to I+1 - line
(1+I+K)+i Wij for j=1
to J

for j=1 to J+1 - line
(2+2I+K)+j Wjk for k=1 to K

**6.4 ****namexx.syy** (prediction file when the outputs
are not known previously)

- column
1: data line number

for i=1 to I: - column
2+2(i-1): value of i-th input

for k=1 to KK - column
2+I+2(k-1) calculated value of the k-th output

**6.5 ****name.cfg** (configuration file)

`Data
file: demo.dat`

`Lines number in the data file: 441 `

`Columns number in the data file: 4 `

`Partition: YES `

`Random partition: YES `

`Lines in the generalisation file: 125 `

`Learning file: demo.a01 `

`Generalisation`` file: demo.g01 `

`Maximum column no 1: 1.000000e+01 `

`Minimum column no 1: 1.000000e+01 `

`Maximum column no 2: 1.000000e+01 `

`Minimum column no 2: 1.000000e+01 `

`Maximum column no 3: 4.100000e+02 `

`Minimum column no 3: 4.500000e+02 `

`Maximum column no 4: 7.300000e+02 `

`Minimum column no 4: 8.300000e+02 `

`I: 2 `

`K: 2 `

`Jmin``: 1 `

`Jmax``: 6 `

`Relative criterium: no `

`Maximum number of iterations: 200 `

`Convergence: 1.000000e03 `

`Use of the initial wheights file? non `

`Maximum of wheights: 1.000000e01 `

`Minimum of wheights: 1.000000e01 `

`Distribution type for wheights: Random `

`Input column no 1: 1 `

`Input column no 2: 2 `

`Output column no 1: 3 `

`Output column no 2: 4 `

`Norm_entree_max``( 1): 1.000000e+01 `

`Norm_entree_min``( 1): 1.000000e+01 `

`Mode de lecture( 1): Normale `

`Norm_entree_max``( 2): 1.000000e+01 `

`Norm_entree_min``( 2): 1.000000e+01 `

`Mode de lecture( 2): Normale `

`Norm_sortie_max``( 1): 4.100000e+02 `

`Norm_sortie_min``( 1): 4.500000e+02 `

`Reading mode( 1): Normal `

`Norm_sortie_max``( 2): 7.300000e+02 `

`Norm_sortie_min``( 2): 8.300000e+02 `

`Reading mode( 2): Normal `

**6.6 Hidden files**

The program creates the following hidden files:

.nnFit, .nnfit_gnuplot; .nnfit_preferences; .nnfit_impression

Once
a model is obtanied, the user may employ it outside NNFit, by using the generated wheights
file corresponding to this model. The user may use a code similar to the fortran example given below.

`program
freso `

`real wij(100,100),wjk(100,100),xmin(2,100),xmax(2,100) `

`real x(100),xmod(2,100),y(100) `

`* The file namexx.w contains the parameters of
the models: `

`open(1,file="namexx.w",status="old")
`

`read(1,*) ii,jj,kk `

`do 10 i=1,ii `

`read(1,*) xmax(1,i),xmin(1,i),xmod(1,i) `

`10 continue `

`do 20 i=1,kk `

`read(1,*) xmax(2,i),xmin(2,i),xmod(2,i) `

`20 continue `

`do 30 i=1,ii+1 `

`read(1,*) (wij(i,j),j=1,jj)
`

`30 continue `

`do 40 j=1,jj+1 `

`read(1,*) (wjk(j,k),k=1,kk)
`

`40 continue `

`* initialize the inputs vector x : x(i)= ? , de i=1 a ii `

`call fctreso(ii,jj,kk,wij,wjk,xmax,xmin,xmod,x,y)
`

`* the outputs are in the vector y* `

`write(6,*) (y(k),k=1,kk) `

`end `

`subroutine fctreso(ii,jj,kk,wij,wjk,xmax,xmin,xmod,x,y)
`

`real wij(100,100),wjk(100,100),xmin(2,100),xmax(2,100) `

`real x(100),u(100),h(100),s(100),xmod(2,100),y(100)
`

`* the network equations `

`do 100 i=1,ii `

`if (xmod(1,i).eq.0.) then `

`u(i)=(x(i)xmin(1,i))/(xmax(1,i)xmin(1,i)) `

`else `

`u(i)=(log10(x(i))xmin(1,i))/(xmax(1,i)xmin(1,i)) `

`endif`` `

`100 continue `

`u(ii+1)=1. `

`do`` 110 j=1,jj `

`h(j)=0. `

`do 120 i=1,ii+1 `

`120 h(j)=h(j)+wij(i,j)*u(i) `

`110 h(j)=1./(1.+exp(h(j))) `

`h(jj+1)=1. `

`do`` 130 k=1,kk `

`s(k)=0. `

`do 140 j=1,jj+1 `

`140 s(k)=s(k)+wjk(j,k)*h(j)
`

`s(k)=1./(1.+exp(s(k))) `

`y(k)=xmin(2,k)+s(k)*(xmax(2,k)xmin(2,k)) `

`if(xmod(2,k).eq.1.) then `

`y(k)=10.**(y(k))j `

`endif`` `

`130 continue `

`return `

`end `

Compiled versions of the program are available for Unix,
under the X environments of HP, IBM-AIX, SGI, SUN and also under LINUX.

NNFit uses the X-Window system and the GNUPLOT
program which are freely available on Internet.

The graphical interface of NNFit uses the public domain library **XForms****
version 0.81**, written by Dr. T. C. Zhao and Mark Overmars
and improved thanks to a netwide effort. This library
and its documentation are available through the WWW: http://bragg.phys.uwm.edu/xforms

The on-paper traced curves
are plotted with GNUPLOT, which is a part of the GNU project.

The development of NNFit was driven on many Intel x86 PC (compatible IBM)
computers with Linux, a freeware fully featured unix-like
operating system.

Professors Grandjean and Thibault address their aknowledgments to the Research Council on Natural Sciences and Engineering of Canada (CRSNG) for the fincancial support. Last modified: May 12, 1997, 23:05h EST