Output Stability

Model consistency varied by topic rather than being uniform across domains. GPT-5.2 showed marked instability in female urology. Some models achieved fully reproducible results in specific domains, shown by zero standard deviation. Anatomy…

1 sources - 4 claims

Model consistency varied by topic rather than being uniform across domains. GPT-5.2 showed marked instability in female urology. Some models achieved fully reproducible results in specific domains, shown by zero standard deviation. Anatomy was unstable for GPT-5.2 and Grok.