FlashEvolve
FlashEvolve achieved the highest 30-minute validation scores on IFBench, HoVer, and AIME, but not on HotpotQA. FlashEvolve improved LLM throughput and proposal throughput over GEPA and Combee under both local vLLM and API serving. FlashEvo…
1 sources - 5 claims
FlashEvolve achieved the highest 30-minute validation scores on IFBench, HoVer, and AIME, but not on HotpotQA. FlashEvolve improved LLM throughput and proposal throughput over GEPA and Combee under both local vLLM and API serving. FlashEvolve reached useful validation scores earlier in longer 180-minute GEPA runs. FlashEvolve decomposes the evolution loop into stages with input queues and worker pools. FlashEvolve treats agent evolution as an asynchronous streaming workflow with LLM-heavy stages connected by queues.