Fisher Information

Fed-FSTQ uses token-level sensitivity as a rate-distortion signal for deciding what to transmit and at what precision. The theoretical compression objective is Fisher-weighted rate-distortion. The method uses squared embedding-gradient nor…

1 sources - 5 claims

Fed-FSTQ uses token-level sensitivity as a rate-distortion signal for deciding what to transmit and at what precision. The theoretical compression objective is Fisher-weighted rate-distortion. The method uses squared embedding-gradient norms as token-level empirical Fisher proxies. Token sensitivity scores are smoothed with an exponential moving average using rho 0.9 by default. The method relies on diagonal Fisher approximations rather than full Fisher matrices, which improves tractability but ignores off-diagonal curvature.