Shallow Verification
CATS improves verification capacity by keeping the drafter shallow and loading the shallow verifier once per cycle. CATS uses a shallow verification pass that loads intermediate layers once per decoding cycle to verify draft tokens in para…
1 sources - 5 claims
CATS improves verification capacity by keeping the drafter shallow and loading the shallow verifier once per cycle. CATS uses a shallow verification pass that loads intermediate layers once per decoding cycle to verify draft tokens in parallel. The shallow verifier produces correction candidates when its tokens differ from corresponding draft tokens. Draft tokens and correction tokens are assembled into a verification tree for target verification. The final target pass verifies the main branch and correction branches in one batched forward pass using tree-masked attention.