Adaptation of dataset change and Vowpal Wabbit

Question

Adaptation of dataset change and Vowpal Wabbit

I am currently playing with Vowpal Wabbit. I am especially interested in how the tool handles dataset shift. From my intuition, this should be straight forward as VW is an online algorithm. So I created a test dataset with three functions. The following dictionary contains the name of the function and the probability of being positive or negative:

{"feat1" : 0.02, "feat2" : 0.3, "feat3" : 0.15}

Then I generate the toy dataset like this:

Select one function at random
Create class label 1 with appropriate probability and -1 otherwise
Generate about 100,000 datasets / rows

I am exploring this dataset using Vowpal Wabbit with a command like this:

vw data/startdata.vw --bfgs --passes 10 --loss_function logistic  --compressed --cache_file data/toy.cache --final_regressor data/toy.model

My logloss and auc look something like this:

logloss = 0.0364034559032
auc = 0.624257973639

The probabilities that have been studied are as follows:

feat1 : 0.0221759405605553977
feat2 : 0.3080836682440751833
feat3 : 0.1524675104299118037

The results are pretty accurate. Now I am creating another dataset with the following probabilities:

{"feat1" : 0.70, "feat2" : 0.80, "feat3" : 0.04}

Thus, the probabilities of a function belonging to a certain class are shifted in such a way that it is a significant shift in the dataset. Since this is of interest to me, I have already included the old model for exploring the new dataset. I.e:

vw data/nextdata.vw --bfgs --passes 10 --loss_function logistic  --compressed --cache_file data/toy.cache -i data/toy.model --final_regressor data/toy.model

But instead of seeing some sort of gimmick, instead we now get the following result:

logloss = 3.22754717189
auc = 0.456873489527

and problems for functions do not change after the second run:

feat1 : 0.0221759405605553977
feat2 : 0.3080836682440751833
feat3 : 0.1524675104299118037

So my conclusion is that nothing is learned when incorporating the old model into the second training of the dataset. However, I expected the Vowpal Wabbit to adjust to the shift over time. Obviously this is not happening.

My question is, how do I tune the wowpal wabbit options to adapt to such a shift in the dataset?

+3

probability shift vowpalwabbit

toom 07 nov. 14 at 16:23

source to share