Adaptation of dataset change and Vowpal Wabbit
I am currently playing with Vowpal Wabbit. I am especially interested in how the tool handles dataset shift. From my intuition, this should be straight forward as VW is an online algorithm. So I created a test dataset with three functions. The following dictionary contains the name of the function and the probability of being positive or negative:
{"feat1" : 0.02, "feat2" : 0.3, "feat3" : 0.15}
Then I generate the toy dataset like this:
- Select one function at random
- Create class label 1 with appropriate probability and -1 otherwise
- Generate about 100,000 datasets / rows
I am exploring this dataset using Vowpal Wabbit with a command like this:
vw data/startdata.vw --bfgs --passes 10 --loss_function logistic --compressed --cache_file data/toy.cache --final_regressor data/toy.model
My logloss and auc look something like this:
logloss = 0.0364034559032
auc = 0.624257973639
The probabilities that have been studied are as follows:
feat1 : 0.0221759405605553977
feat2 : 0.3080836682440751833
feat3 : 0.1524675104299118037
The results are pretty accurate. Now I am creating another dataset with the following probabilities:
{"feat1" : 0.70, "feat2" : 0.80, "feat3" : 0.04}
Thus, the probabilities of a function belonging to a certain class are shifted in such a way that it is a significant shift in the dataset. Since this is of interest to me, I have already included the old model for exploring the new dataset. I.e:
vw data/nextdata.vw --bfgs --passes 10 --loss_function logistic --compressed --cache_file data/toy.cache -i data/toy.model --final_regressor data/toy.model
But instead of seeing some sort of gimmick, instead we now get the following result:
logloss = 3.22754717189
auc = 0.456873489527
and problems for functions do not change after the second run:
feat1 : 0.0221759405605553977
feat2 : 0.3080836682440751833
feat3 : 0.1524675104299118037
So my conclusion is that nothing is learned when incorporating the old model into the second training of the dataset. However, I expected the Vowpal Wabbit to adjust to the shift over time. Obviously this is not happening.
My question is, how do I tune the wowpal wabbit options to adapt to such a shift in the dataset?
source to share
No one has answered this question yet
Check out similar questions: