Rollout Pass-Rate Control
Binary-reward learning is most informative when the rollout pass rate is 50%. For N = 8, exactly four successful rollouts produces the maximum number of success-failure contrastive pairs. Bernoulli reward entropy is maximized at a pass rat…
1 sources - 5 claims
Binary-reward learning is most informative when the rollout pass rate is 50%. For N = 8, exactly four successful rollouts produces the maximum number of success-failure contrastive pairs. Bernoulli reward entropy is maximized at a pass rate of 0.5. Rollout Pass-Rate Control defines the pass rate as the fraction of successful rollouts in a fixed task and rollout group. The probability that a rollout group survives all-fail or all-pass filtering is maximized at a pass rate of 0.5.