Comparative Methods
Compared with CALM* and LayerSkip, N-vium occupied a Pareto region with higher speed and no worse perplexity. Existing early-exit methods generally treat intermediate predictions as approximations to final-layer distributions, creating cal…
1 sources - 4 claims
Compared with CALM* and LayerSkip, N-vium occupied a Pareto region with higher speed and no worse perplexity. Existing early-exit methods generally treat intermediate predictions as approximations to final-layer distributions, creating calibration and cache issues. N-vium resembles speculative decoding's parallel verification stage but does not require a separate draft model or accept-reject correction. CALM* was reported as faster but with degraded perplexity.