Generalization

On Meta-Harness, FlashEvolve increased proposal and validation throughput from 0.3 to 1.4 proposals per minute. FlashEvolve generalized beyond GEPA to ACE and Meta-Harness workloads. The evaluation covers prompt evolution, context evolutio…

1 sources - 5 claims

On Meta-Harness, FlashEvolve increased proposal and validation throughput from 0.3 to 1.4 proposals per minute. FlashEvolve generalized beyond GEPA to ACE and Meta-Harness workloads. The evaluation covers prompt evolution, context evolution, and harness-code evolution but not the full space of possible artifacts. Broader testing on memory evolution, tool-use policies, generated programs, and additional algorithms remains future work. Meta-Harness progress was constrained by weak code-generation ability of the open-source model used in the experiments.