Block-ChaCAL

1 sources - 6 claims

On BOXES, large-block Block-ChaCAL retained dense ChaCAL-level exact match while reducing evaluation time. At about 97% exact match, Block-ChaCAL reduced runtime by roughly 2.4 times relative to a 5-layer dense Transformer. Balancing the local and residual costs yields approximately O(n^(4/3) d) sequence complexity when hidden width is independent of sequence length. Block-ChaCAL partitions the sequence into contiguous blocks and decomposes attention into block-diagonal and off-block residual components. Block-ChaCAL preserves exact within-block masked attention semantics by applying the resolvent exactly to each causal diagonal tile. Block-ChaCAL handles off-block interactions through a down-sampled block-level system and then lifts the result back.