OpenCV MLP with sigmoid neurons, output range

I searched for answers here on SO and google for the following question but couldn't find anything, so here is my situation:

I want to implement an MLP that learns about some similarity function. I have training and test samples and MLP is up and running. My problem is how to expose the teacher's outputs to the network (from which the range is).

Here is the relevant part of my code:

CvANN_MLP_TrainParams params(
    cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, 0.000001),
    CvANN_MLP_TrainParams::BACKPROP,
    0.1,
    0.1);

Mat layers = (Mat_<int>(3,1) << FEAT_SIZE, H_NEURONS, 1);

CvANN_MLP net(layers, CvANN_MLP::SIGMOID_SYM, 1, 1);

int iter = net.train(X, Y, Mat(), Mat(), params);

net.predict(X_test, predictions);

      

The number of input and hidden neurons is set somewhere else and the network has 1 output neuron. X, Y, X_test are mats containing training and test samples, no problem here. The problem is what range of values ​​my Y should be from, and from what range of values ​​the predictions will be expected.

I found the following statements in the documentation :

For training:

If you use the default activation function cvANN_MLP: cvANN_MLP :: SIGMOID_SYM, for optimal results the output should be in the range [-1,1] instead of [0,1].

Since I am NOT using the default sigmoid function (the one with alpha = 0 and beta = 0), I am providing my Y from [0,1]. Is this correct, or do they mean something else with the "default sigmoid function"? I ask this because for prediction, they explicitly state alpha and beta:

If you use the default activation function cvANN_MLP :: SIGMOID_SYM with default parameter values ​​fparam1 = 0 and fparam2 = 0, then the function used is y = 1.7159 * tanh (2/3 * x), so the output will vary from [-1.7159, 1.7159 ] instead of [0,1].

Again, since I am not using the default sigmoid function, I am expecting to get predictions from [0,1]. Am I still right?

What scares me is that I found another question regarding the output range of the OpenCV sigmoid function, which says the range should be [-1, 1].

And now the real confusion arises: when I train the network and give it some predictions, I get values ​​a little over 1 (about 1.03), regardless of whether my Y is selected from [0,1] or [-1, 1 ]. And that shouldn't be happening anyway.

Can someone please enlighten me? Did I miss something?

Thanks in advance.

EDIT:

To make things clear, I came up with a small example that shows the problem:

#include <iostream>
#include <opencv2/core/core.hpp>
#include <opencv2/ml/ml.hpp>

using namespace cv;
using namespace std;

int main() {

    int POS = 1;
    int NEG = -1;

    int SAMPLES = 100;
    float SPLIT = 0.8;

    float C_X = 0.5;
    float C_Y = 0.5;
    float R = 0.3;

    Mat X(SAMPLES, 2, CV_32FC1);
    Mat Y(SAMPLES, 1, CV_32FC1);

    randu(X, 0, 1);

    for(int i = 0; i < SAMPLES; i++){
        Y.at<float>(i,0) = pow((X.at<float>(i,0) - C_X),2) + pow((X.at<float>(i,1) - C_Y),2) < pow(R,2) ? POS : NEG;
    }

    Mat X_train = X(Range(0, (int)(SAMPLES*SPLIT)), Range::all());
    Mat Y_train = Y(Range(0, (int)(SAMPLES*SPLIT)), Range::all());

    Mat X_test = X(Range((int)(SAMPLES*SPLIT), SAMPLES), Range::all());
    Mat Y_test = Y(Range((int)(SAMPLES*SPLIT), SAMPLES), Range::all());

    CvANN_MLP_TrainParams params(
                 cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, 0.000001),
                 CvANN_MLP_TrainParams::BACKPROP,
                 0.1,
                 0.1);

    Mat layers = (Mat_<int>(3,1) << 2, 4, 1);

    CvANN_MLP net(layers, CvANN_MLP::SIGMOID_SYM, 1, 1);
    net.train(X_train, Y_train, Mat(), Mat(), params);

    Mat predictions(Y_test.size(), CV_32F); 
    net.predict(X_test, predictions);

    cout << predictions << endl;

    Mat error = predictions-Y_test;
    multiply(error, error, error);

    float mse = sum(error)[0]/error.rows;

    cout << "MSE: " << mse << endl;

    return 0;
}

      

This code generates a set of random points from a unit square and assigns POS or NEG labels to them, depending on whether they are inside the circle specified by C_X, C_Y and R. Then a test and training set is generated and the MLP is trained. We now have two situations:

  • POS = 1, NEG = -1:

The output is provided to the network in the same way as for the tan neurons (from [-1,1]), and I expect a prediction from this range. But I also get predictions like -1.018 or 1.052. The root mean square error in this case was 0.13071 for me.

  1. POS = 1, NEG = 0:

The output is set as if it is considered optimal (at least that's how I understand the documentation). And since I'm not using the default sigmoid function, I'm expecting a prediction from [0,1]. But I also get values ​​like 1.0263158 and even negative ones. MSE in this case gets better from 0.0326775.

I know this example is a classification problem, and normally I would just round the values ​​to the nearest label, but I want to study the similarity function and have to rely on predictions coming from some fixed range.

+3


source to share


2 answers


My answer is delayed, so I am writing this for other people with the same question.

If you see setActivationFunction () and calc_activ_func () in ann_mlp.cpp , the sigmoid value will return a value between [-1.7159, 1.7159] when you set fparam1, fparam2 to 0, 0. You can change the slope and range by adjusting fparam1, fparam2 ...



The functions are called symmetric sigmoids, but they actually calculate tanh. If you want a real sigmoidal function, I think you need to implement it.

+1


source


It really comes down to the activation function that applies to your MLP.

There are several different activation functions that can suppress the value of an artificial neuron to a certain range (the most common I am familiar with are hyperbolic tangent and logistic functions, but many others exist). Perhaps the one you are using for your neurons is being scaled to go out of the 0 to 1 range.



Regarding the above comment, for optimal results, it recommends that the data be formatted across the full range of outputs for the function, so that MLP can learn across the entire range rather than a subset, which can reduce its learning ability.

0


source







All Articles