DUET

DUET improved both accuracy and wall-clock efficiency in the reported experiments. DUET is presented as improving efficiency and learning-signal quality rather than simply trading accuracy for speed. DUET coordinates rollout count and roll…

1 sources - 5 claims

DUET improved both accuracy and wall-clock efficiency in the reported experiments. DUET is presented as improving efficiency and learning-signal quality rather than simply trading accuracy for speed. DUET coordinates rollout count and rollout length simultaneously, unlike compared baselines that address only part of the cost problem. DUET frames RLVR efficiency as jointly deciding rollout counts and stopping points under one shared token budget. DUET is implemented as a three-phase layer over GRPO covering allocation, generation, and update.