Adaptation of dataset change and Vowpal Wabbit

I am currently playing with Vowpal Wabbit. I am especially interested in how the tool handles dataset shift. From my intuition, this should be straight forward as VW is an online algorithm. So I created a test dataset with three functions. The following dictionary contains the name of the function and the probability of being positive or negative:

{"feat1" : 0.02, "feat2" : 0.3, "feat3" : 0.15}

      

Then I generate the toy dataset like this:

  • Select one function at random
  • Create class label 1 with appropriate probability and -1 otherwise
  • Generate about 100,000 datasets / rows

I am exploring this dataset using Vowpal Wabbit with a command like this:

vw data/startdata.vw --bfgs --passes 10 --loss_function logistic  --compressed --cache_file data/toy.cache --final_regressor data/toy.model

      

My logloss and auc look something like this:

logloss = 0.0364034559032
auc = 0.624257973639

      

The probabilities that have been studied are as follows:

feat1 : 0.0221759405605553977
feat2 : 0.3080836682440751833
feat3 : 0.1524675104299118037

      

The results are pretty accurate. Now I am creating another dataset with the following probabilities:

{"feat1" : 0.70, "feat2" : 0.80, "feat3" : 0.04}

      

Thus, the probabilities of a function belonging to a certain class are shifted in such a way that it is a significant shift in the dataset. Since this is of interest to me, I have already included the old model for exploring the new dataset. I.e:

vw data/nextdata.vw --bfgs --passes 10 --loss_function logistic  --compressed --cache_file data/toy.cache -i data/toy.model --final_regressor data/toy.model

      

But instead of seeing some sort of gimmick, instead we now get the following result:

logloss = 3.22754717189
auc = 0.456873489527

      

and problems for functions do not change after the second run:

feat1 : 0.0221759405605553977
feat2 : 0.3080836682440751833
feat3 : 0.1524675104299118037

      

So my conclusion is that nothing is learned when incorporating the old model into the second training of the dataset. However, I expected the Vowpal Wabbit to adjust to the shift over time. Obviously this is not happening.

My question is, how do I tune the wowpal wabbit options to adapt to such a shift in the dataset?

+3


source to share





All Articles