Large Language Models

1 sources - 4 claims

Three of the four tested models exceeded the indicative 74% pass threshold. Current frontier models showed substantial improvement over the previously reported ChatGPT-3.5 score on an FRCS Urology examination. The study evaluated GPT-5.2, Gemini 3 Pro, Claude Sonnet 4.6, and Grok 4.1 on a simulated FRCS(Urol) Part A examination. The study concludes that frontier LLMs may help urology trainees revise but should not be treated as sole or authoritative sources.