Output Stability
Model consistency varied by topic rather than being uniform across domains. GPT-5.2 showed marked instability in female urology. Some models achieved fully reproducible results in specific domains, shown by zero standard deviation. Anatomy…
1 sources - 4 claims
Model consistency varied by topic rather than being uniform across domains. GPT-5.2 showed marked instability in female urology. Some models achieved fully reproducible results in specific domains, shown by zero standard deviation. Anatomy was unstable for GPT-5.2 and Grok.