Rollout Informativeness under a Fixed Budget
InfoTree with ABA sustained higher RIFB over 300 steps while flat GRPO RIFB declined. RIFB is defined as the expected squared norm of the GRPO gradient mass contributed by a rollout set. The tree objective F correlated strongly with measur…
1 sources - 4 claims
InfoTree with ABA sustained higher RIFB over 300 steps while flat GRPO RIFB declined. RIFB is defined as the expected squared norm of the GRPO gradient mass contributed by a rollout set. The tree objective F correlated strongly with measured RIFB across 500 prompts. The paper argues that rollout selection should optimize gradient informativeness rather than be treated as a budget-agnostic sampling detail.