Is multiarm bandit a choice when there is very low reward

120 Views Asked by Aravind Chamakura At 11 December 2018 at 07:58

Is any version of multiarm bandit (EpsilonGreedy, Thompson Sampling, UCB) any good when there is very low reward/click rate for the high pull rate. I have 600 piece of content with approximately 3000 clicks (total across all content) per day for a volume of approximately million requests. With this would it be useful to implement MAB, is this rate of click any statistical significance for the algorithm.

Original Q&A

There are 1 best solutions below

Sanit On 10 February 2020 at 07:32

Do the 600 pieces of content change every day or do they stay the same? If they stay the same, then an asymptotically optimal algorithm would start performing extremely well soon enough.

Even if the pieces of content change, Thompson Sampling should still work and give you something which significantly better than random. I have run various experiments with Thompson Sampling for my research and it seems to start doing well very quickly on most of them.

Is multiarm bandit a choice when there is very low reward

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in EPSILON

Related Questions in BANDIT

Trending Questions

Popular # Hahtags

Popular Questions