Human Evaluation Results
Multimodal AMIE outperformed PCPs in top-k diagnostic accuracy for k from 1 through 10. Specialists rated AMIE higher overall across diagnosis, management, history-taking, multimodal interpretation, artifact-grounded reasoning, and artifac…
1 sources - 5 claims
Multimodal AMIE outperformed PCPs in top-k diagnostic accuracy for k from 1 through 10. Specialists rated AMIE higher overall across diagnosis, management, history-taking, multimodal interpretation, artifact-grounded reasoning, and artifact-related questions. Multimodal AMIE was rated similar to or higher than PCPs across history-taking, diagnostic accuracy, management reasoning, communication, empathy, and multimodal handling. Patient-actors rated AMIE similar to or higher than PCPs on patient-centric consultation metrics. Low artifact quality reduced top-3 diagnostic accuracy for both AMIE and PCPs, but AMIE had a smaller drop.