I'm simulating a scenario where there are two options (sports/politics) with two conversion rates (c_0, c_1). In order to decide which option to display to a customer, I'm using a contextual bandit model.
I've generated 100 data points with a fixed context (user=Tom) which has the following format
`'shared |Context user=Tom\n0:{cost}:0.5 |Action choice=sports \n|Action choice=politics '
One thing to note, is that the cost is a stochastically generated value. In this scenario:
- P(cost=-1|choice=sports) = 0.6
- P(cost=-1|choice=politics) = 0.7
- P(cost=0.2|choice=sports) = 0.4
- P(cost=0.2|choice=politics) = 0.3
In this training data set, the average cost of option "sports" is -0.52 whereas the average cost of politics is -0.65. Thus, I'm expecting the model to prefer option B (i.e. politics) over option A after being trained on these 100 samples. However, running a prediction, after training, on 'shared |Context user=Tom \n|Action choice=sports \n|Action choice=politics ' I get the PMF [0.9, 0.1].
This is concerning for multiple reasons:
- I'm expecting the opposite output, where P(B) > P(A).
- Not only does the model think that option A is better, it's extremely confident of it being so. I would expect the probabilities to be around 40-60%, but it converges to 90% (!!).
I've tried tuning the model; changing the parameters can make the model better for a specific dataset, but regenerating the dataset easily produces a state where the model behaves like above.
The model is being run as
import vowpalwabbit as vw
model = vw.Workspace(
"--cb_explore_adf --passes 1000 -l 0.2 --cb_type ips --holdout_off --epsilon 0.2 --cache -k"
)
for sample in total_training:
x = model.parse(
sample,
vw.LabelType.CONTEXTUAL_BANDIT
)
model.learn(x)
model.predict('shared |Context user=Tom \n|Action choice=sports \n|Action choice=politics ')
The full training set is:
'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics '
From your question I would infer that sports is the better, lowest cost, option, so the model would do what it is supposed to do. If there is a typo there, it could still be that your generated samples favor the other option. Does the learned policy reflect the "sample average" performance of both actions (which can differ from the optimal policy based on the underlying parameters).