Agent-as-a-Verifier
AaaV-CE improved coverage by forcing explicit verdicts but still struggled with difficult gameplay-dependent specification elements. AaaV works better for web applications than for games because web application states are more discrete and…
1 sources - 5 claims
AaaV-CE improved coverage by forcing explicit verdicts but still struggled with difficult gameplay-dependent specification elements. AaaV works better for web applications than for games because web application states are more discrete and reachable through short UI action sequences. Agent-as-a-Verifier verifies generated games through open-ended gameplay. AaaV-Direct had extremely low accuracy because it often checked only a small and easy subset of the specification. AaaV verdicts are limited by whether the agent can reach states that exercise target mechanics.