Memorization

Increasing model capacity accelerates memorization. τmem is defined as the first step where sample memorization reliably exceeds 0.1. Around memorization onset, training loss separates from held-out valid loss. Memorization corresponds to…

1 sources - 6 claims

Increasing model capacity accelerates memorization. τmem is defined as the first step where sample memorization reliably exceeds 0.1. Around memorization onset, training loss separates from held-out valid loss. Memorization corresponds to expansion of basins around actual training samples into nearby held-out valid regions. Memorization time is approximately invariant to parity group size and scales nearly linearly with dataset size. Memorization onset depends on gradient updates per example in the reported setup.