I have a question about the parametrization of C, H and lambda in the paper: "A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes" (or for anyone with some general knowledge on reinforcement learning and especially lambda), in page 5.
More precisely, I do not see any indication of if the parametrizations H, C or lambda are dependent on such factors as the sparsity or distance of a reward, given the environment might have rewards within any number of steps in the future.
For example, let's assume that there is an environment that requires a string of 7 actions to reach a reward from an average starting state, and another that requires 2 actions. When planning with trees, it seems obvious that, given the usual exponential branching of the state space, C (size of the sample) and H (horizon length) should be dependent on how far removed from the current state these rewards are. For the one with 2 steps away from an average state it might be enough to have an H = 2 for example. Similarly C should be dependent on the sparsity of a reward, that is, if there are 1000 possible states and only one of them has a reward, C should be higher than if the reward would be found with every 5 states (assume multiple states give the same reward vs. a goal-oriented problem).
So the question is, are my assumptions correct, or what have I missed about sampling? Those definitions on page 5 of the linked pdf do not have any mention of any dependency on the branching factor or sparsity of rewards.
Thank you for your time.