Rollout Pass-Rate Control

Binary-reward learning is most informative when the rollout pass rate is 50%. For N = 8, exactly four successful rollouts produces the maximum number of success-failure contrastive pairs. Bernoulli reward entropy is maximized at a pass rat…

1 sources - 5 claims

Binary-reward learning is most informative when the rollout pass rate is 50%. For N = 8, exactly four successful rollouts produces the maximum number of success-failure contrastive pairs. Bernoulli reward entropy is maximized at a pass rate of 0.5. Rollout Pass-Rate Control defines the pass rate as the fraction of successful rollouts in a fixed task and rollout group. The probability that a rollout group survives all-fail or all-pass filtering is maximized at a pass rate of 0.5.