Structured Sparsity

The method assumes attention has block-structured sparsity with most mass in local tiles and lighter off-block residue. Exact local tiles carry most of the tracking signal, while the residual branch repairs cross-block inconsistencies. Eff…

1 sources - 5 claims

The method assumes attention has block-structured sparsity with most mass in local tiles and lighter off-block residue. Exact local tiles carry most of the tracking signal, while the residual branch repairs cross-block inconsistencies. Efficiency depends on choosing block size near the empirical balance point. Cross-block compression alone cannot reliably reconstruct state propagation without exact local routing. Tasks with diffuse global dependencies may not preserve the same speedups or accuracy as block-structured entity-tracking tasks.