GRESO
GRESO was developed around math-style reasoning settings without a multi-step environment trajectory to inspect. GRESO predicts uninformative prompts before rollout begins using cross-epoch reward consistency on math tasks. The article ide…
1 sources - 4 claims
GRESO was developed around math-style reasoning settings without a multi-step environment trajectory to inspect. GRESO predicts uninformative prompts before rollout begins using cross-epoch reward consistency on math tasks. The article identifies evaluating a combined pipeline with GRESO and DAPO as future work.