Top-k Pruning

The blockwise method avoids naive row-pruning failure by preserving exact local inverse computation and using a reduced off-block system. Raw one-hop attention can remain accurate with very small top-k masks in the tested setting. Pruning…

1 sources - 4 claims

The blockwise method avoids naive row-pruning failure by preserving exact local inverse computation and using a reduced off-block system. Raw one-hop attention can remain accurate with very small top-k masks in the tested setting. Pruning before the inverse can remove low-weight bridge edges that later matter after powers of the attention matrix densify. Inverse-based operators degraded sharply under train-and-test top-k masking.