I am currently using the Vowpal Wabbit package in order to implement a Contextual Bandit use case.
My use case is to provide categories(L1/L2/L3/L4/L5) considered action here with personalized ranking to the user on the basis of context like:
recent_searched_categories
clicked_categories
type_of_user = daily/monthly/weekly
age={agebracket} 1,2,3
gender=male/female
tier=tier1/2/3/4
I have simulated a cost function and learning online on the basis of cost and action chosen using --cb_explore_adf -q UA param.
Sample Dataset:
shared |User user=Anna time_of_day=monthly gender=female age=3 |clicked_cats clicked_cats_1=L1 clicked_cats_2=L4 |recent_cats recent_cats_1=L2 recent_cats_2=L4
|Action category=L1
0:-0.3:0.19765689674531078 |Action category=L2
|Action category=L2
|Action category=L3
|Action category=L4
|Action category=L5
shared |User user=Tom time_of_day=weekly gender=male age=2 |clicked_cats clicked_cats_1=L2 clicked_cats_2=L3 |recent_cats recent_cats_1=L1 recent_cats_2=L4
|Action category=L1
|Action category=L2
0:-0.7:0.21600767970085144 |Action category=L3
|Action category=L3
|Action category=L4
|Action category=L5
shared |User user=Rohan time_of_day=daily gender=male age=1 |clicked_cats clicked_cats_1=L4 clicked_cats_2=L5 |recent_cats recent_cats_1=L1 recent_cats_2=L2
|Action category=L1
|Action category=L2
|Action category=L3
0:-0.7:0.20174514633095228 |Action category=L4
|Action category=L4
|Action category=L5
My question here is:
Is the data format mentioned above is correct ? If not how should we create a input training dataset to learn the model.
Please suggest what type of algorithms we can use for the above use case for exploring as well optimising the probabilites for all the categories on the basis of context and how to validate the performance of the algorithm.