MoE Expert Pruning
RCO outperformed EvoESAP on Qwen3-30B-A3B at both 25% and 50% sparsity while taking less time. On OLMoE-1B-7B, RCO exceeded EvoESAP after 50 steps and improved further by 300 steps. Calibration data strongly determines which capabilities s…
1 sources - 4 claims
RCO outperformed EvoESAP on Qwen3-30B-A3B at both 25% and 50% sparsity while taking less time. On OLMoE-1B-7B, RCO exceeded EvoESAP after 50 steps and improved further by 300 steps. Calibration data strongly determines which capabilities survive expert pruning. RCO searches a larger pruning space than layer-count evolutionary allocation because it can choose both how many and which experts to prune.