GameGen-Verifier
GameGen-Verifier reduced wall-clock verification time compared with AaaV-CE across all tested backends. Under Codex, GameGen-Verifier achieved 0.922 Acc@5 and 0.954 F1@5. GameGen-Verifier substantially outperformed Agent-as-a-Verifier base…
1 sources - 5 claims
GameGen-Verifier reduced wall-clock verification time compared with AaaV-CE across all tested backends. Under Codex, GameGen-Verifier achieved 0.922 Acc@5 and 0.954 F1@5. GameGen-Verifier substantially outperformed Agent-as-a-Verifier baselines in alignment with human judgment. GameGen-Verifier uses white-box source access to construct target runtime states directly instead of reaching them through gameplay. GameGen-Verifier verifies LLM-generated games by converting broad specifications into localized behavioral checks.