Convergence-Based Early Exit

The paper recommends an inference threshold of 0.95 for balancing quality and latency. The evaluated inference procedure exits after a minimum layer when the current pooled representation is sufficiently similar to an earlier pooled repres…

1 sources - 5 claims

The paper recommends an inference threshold of 0.95 for balancing quality and latency. The evaluated inference procedure exits after a minimum layer when the current pooled representation is sufficiently similar to an earlier pooled representation. Convergence-based early exit terminates inference when intermediate representations are sufficiently stable. Early exit is viable when layer-to-layer representation changes diminish consistently. Batch inference limits realized savings because total batch compute depends on the latest-exiting sample.