DAPO
DAPO saves training cost but does not save rollout cost. Selective rollout differs from DAPO because it uses information revealed during an agent rollout. DAPO removes zero-variance groups after rollouts are complete. The article recommend…
1 sources - 4 claims
DAPO saves training cost but does not save rollout cost. Selective rollout differs from DAPO because it uses information revealed during an agent rollout. DAPO removes zero-variance groups after rollouts are complete. The article recommends combining selective rollout with DAPO in production.