Learning-Zone Energy (LZE)

The Energy Score can be interpreted as a sample-level attention mechanism where the difficulty anchor is the key, the EMA is the query, and 4p(1-p)(1+αm) is the normalized attention score. The difficulty anchor is computed from a single fo…

1 sources - 7 claims

The Energy Score can be interpreted as a sample-level attention mechanism where the difficulty anchor is the key, the EMA is the query, and 4p(1-p)(1+αm) is the normalized attention score. The difficulty anchor is computed from a single forward pass at initialization and never updated thereafter, encoding intrinsic prompt hardness as a fixed prior. Gumbel perturbations are added to Energy Scores before ranking to provide stochastic exploration in prompt selection. The outcome uncertainty term 4p(1-p) is symmetric, simultaneously down-weighting both all-correct and all-incorrect regimes, unlike Focal Loss which is asymmetric. The forward pruner re-evaluates a fraction of pruned prompts each epoch and restores those that no longer achieve full correctness, serving as a safety mechanism against catastrophic forgetting. LZE is a dual-stage data selection framework with a per-step backward selector that concentrates gradient budget and an epoch-level forward pruner that eliminates rollout generation for persistently solved prompts. The pass-rate momentum factor amplifies prompts where the policy is actively improving and is neutral for stagnating prompts.