Gradient Computation Cost

1 sources - 4 claims

The finding that gradient computation dominates training cost inverts the assumption behind most prior VLA RL efficiency work and motivates treating gradient allocation as an explicit design axis. Prior efficiency work implicitly assumed rollout collection was the dominant cost, but this paper challenges that assumption with direct measurement. An earlier approach of branching at decision-critical timesteps was abandoned because it added exponential rollout overhead rather than reducing gradient computation. In simulator-based GRPO training of a 7B VLA model, gradient computation accounts for approximately 78% of wall-clock time per training step while rollout collection accounts for only approximately 21%.