Optimizer Benchmarking

1 sources - 5 claims

Fixed-resource evaluation captures practical value when memory savings enable otherwise infeasible configurations. Fixed-model evaluation isolates optimizer behavior by holding model, data, token budget, batch size, precision, hardware, and training recipe constant. The survey treats optimizer evaluation as a multi-objective problem involving efficiency, memory, stability, scalability, and implementation complexity. Credible LLM optimizer comparisons should report loss, memory, throughput, stability, tuning budget, and implementation details rather than validation loss alone. Common benchmark pitfalls include under-tuned baselines, unequal tuning budgets, early-curve overclaiming, ignored wall-clock costs, incomplete memory reporting, small-scale extrapolation, and implementation confounding.