SWE-bench Verified

On SWE-bench Verified, Prefix Sampling improved the 32B model's peak Pass@1 over the same-step baseline by 5.4 percentage points. On SWE-bench Verified, Prefix Sampling improved the 14B model's peak Pass@1 over the same-step baseline by 4.…

1 sources - 5 claims

On SWE-bench Verified, Prefix Sampling improved the 32B model's peak Pass@1 over the same-step baseline by 5.4 percentage points. On SWE-bench Verified, Prefix Sampling improved the 14B model's peak Pass@1 over the same-step baseline by 4.7 percentage points. Agentic Prefix Sampling can reduce wall-clock cost by replaying prefix execution rather than regenerating prefix text. The SWE-bench experiments trained Qwen3-14B and Qwen3-32B in thinking mode on R2E-Gym-Subset and evaluated on SWE-bench Verified. SWE-bench-style agentic reinforcement learning uses long, stateful, interactive, and expensive rollouts.