Exact Inference
The two-step sampler is exact because its exit probabilities match the mixture weights. N-vium does not permanently skip upper-layer computation for early-exited tokens. Deferred upper-layer states are processed with later tokens through p…
1 sources - 4 claims
The two-step sampler is exact because its exit probabilities match the mixture weights. N-vium does not permanently skip upper-layer computation for early-exited tokens. Deferred upper-layer states are processed with later tokens through piggybacking so every token eventually traverses all layers. Inference may either build the full mixture distribution explicitly or sample routing decisions and exit distributions in two steps.