InfoTree

InfoTree improved over flat GRPO across nine benchmarks by 2.5 to 11.2 points. InfoTree is presented as a training-time, budget-aware tree-search framework. The main InfoTree configuration uses a 16-leaf training budget per prompt and a 32…

1 sources - 5 claims

InfoTree improved over flat GRPO across nine benchmarks by 2.5 to 11.2 points. InfoTree is presented as a training-time, budget-aware tree-search framework. The main InfoTree configuration uses a 16-leaf training budget per prompt and a 32-leaf validation budget. InfoTree can be combined with DPS and prefix sharing for better results than InfoTree alone. InfoTree initializes from a root prompt, samples initial trajectories, computes entropy statistics, and expands frontier nodes using UUCB under a leaf budget.