Model Performance

7 sources - 32 claims

Combined expert-selected and data-driven variables produced the best overall discrimination. The endemic model had the highest overall performance on the 2023 validation set. Adding treatment variables only marginally improved AUC. Gemini 3 Pro was the highest-scoring model overall. Grok 4.1 failed to reach the 74% threshold and its entire confidence interval was below that threshold. T1Gd IA-QCNN patient accuracy exceeded the DNN and VGG16 comparisons reported in the article. Prevalent models performed better than index models, especially for 1-year mortality. The tested models scored between 70.4% and 82.4%, indicating major progress across model generations. Compared with T1Gd, mpMRI increased training accuracy but reduced validation accuracy and patient-level discrimination. Clinical utility will be assessed with decision curve analysis against treat-all and treat-none strategies. The pre-pandemic and endemic models reached much higher top-risk post-test probabilities than the pandemic model on 2023 data. Transfer learning and contemporary deep models generally underperformed T1Gd IA-QCNN at patient level. The baseline T1Gd IA-QCNN achieved patient-level test accuracy of 0.67…