How to limit certain actions from Vowpal Wabbit Contextual Bandit based on context

124 Views Asked by Cris Pineda At 28 July 2025 at 01:02

I'm working on creating a contextual bandit for recommending actions to a user on our website. I want to limit certain actions from showing based on the users context.

For example, if a user has already signed up, I don't want it to recommend them "sign up"

model = pyvw.vw(f'--cb_explore_adf -q PA --quiet --epsilon {EPSILON}')

Here is an example of the input data:

shared |Page pageViewCount:6 videoViewCount:3 language:es user_nation:US page_section:news time_on_site:8.632452878867632 is_signed_up:0 is_subscribed:1 has_downloaded_app:1 favorites_last_updated:56.7385141116986 
|Action a=sign_up 
|Action a=subscribe_mktg_comms 
|Action a=recommend_content 
|Action a=favorites 
|Action a=download_app 
|Action a=do_nothing 
|Action a=survey

I keep reading to put the probability to 0 but a bit confused because for training, I see that we need to put action:reward:probability on the chosen arm, but I don't see where to put it in input data.

I already read to remove the actions but I'm not sure if this would affect the training data since the arm indexes would be different then.

Original Q&A

There are 1 best solutions below

Aviel On 08 May 2023 at 08:34

When you train you are passing a list of potential actions you want to train on so for each row (user/context) you can pass only the list of potential actions you want.

then on the inference side: you will pass only relevant actions

def get_action(vw, context, actions):
    vw_text_example = to_vw_example_format(context, actions)
    pmf = vw.predict(vw_text_example)
    chosen_action_index, prob = sample_custom_pmf(pmf)
    return actions[chosen_action_index], prob

I already read to remove the actions but I'm not sure if this would affect the training data since the arm indexes would be different then.

it shouldn't affect it if you train again and give the list of potential actions and the action that was chosen.

How to limit certain actions from Vowpal Wabbit Contextual Bandit based on context

There are 1 best solutions below

Related Questions in REINFORCEMENT-LEARNING

Related Questions in VOWPALWABBIT

Related Questions in BANDIT-PYTHON

Related Questions in BANDIT

Trending Questions

Popular # Hahtags

Popular Questions