Application MFU
In production validation, OFU and application MFU had moderate correlation across all 608 jobs and stronger correlation after excluding jobs affected by identified FLOPs miscalculations. Framework-level MFU estimation is fragmented and bri…
1 sources - 4 claims
In production validation, OFU and application MFU had moderate correlation across all 608 jobs and stronger correlation after excluding jobs affected by identified FLOPs miscalculations. Framework-level MFU estimation is fragmented and brittle because FLOPs formulas must be updated for new training modalities. A DeepSeek-style MoE job reported application MFU far above OFU because the framework FLOPs counter assumed experts operated at the full hidden dimension. A hybrid Mamba-Transformer MoE job inflated reported FLOPs because its Megatron-LM branch lacked per-layer-type FLOPs accounting.