GRESO

GRESO was developed around math-style reasoning settings without a multi-step environment trajectory to inspect. GRESO predicts uninformative prompts before rollout begins using cross-epoch reward consistency on math tasks. The article ide…

1 sources - 4 claims

GRESO was developed around math-style reasoning settings without a multi-step environment trajectory to inspect. GRESO predicts uninformative prompts before rollout begins using cross-epoch reward consistency on math tasks. The article identifies evaluating a combined pipeline with GRESO and DAPO as future work.