Downstream Evaluation

Knowledge and commonsense tasks broadly benefited from larger expert pools and progressive expansion. Fixed E=128 retained a larger advantage on GSM8K than on most tasks. EMO Stage 5 was much stronger than fixed E=16 and generally stronger…

1 sources - 5 claims

Knowledge and commonsense tasks broadly benefited from larger expert pools and progressive expansion. Fixed E=128 retained a larger advantage on GSM8K than on most tasks. EMO Stage 5 was much stronger than fixed E=16 and generally stronger than fixed E=32 on downstream tasks. The paper leaves open whether reasoning-heavy benchmarks need earlier exposure to the full expert pool than knowledge or commonsense tasks. Reasoning-heavy behavior may require more training time with the full expert pool.