I'm currently working on the problem of evaluating a contextual bandit (target) policy given a dataset that is generated by another policy (which is off-policy policy evaluation for contextual bandits)

The problem I tackle has two or more continuous action dimensions. And I need real-world dataset that satisfies this condition. If there is a medical dataset that suits this condition, I think it best suits me.

Is there a dataset that contains two or more doctors (corresponding to 2 policies) prescribing two or more medicine doses in continuous value (which corresponds to 2 or more continuous action dimensions) to patients and also contains the patients' condition after taking the medicine (which corresponds to rewards)?

0

There are 0 best solutions below