Model Evaluation

7 sources - 34 claims

In the expert-count ablation, LoRA-MoE performed best with 5 experts. The reported RelAge-GNN test value for fitting regression coefficient was 0.962. The reported RelAge-GNN test MAE was about 3.4 to 3.43 and MSE was 29.46. Using full-band EEG, DeepTokenEEG achieved best or near-best results across ADFTD, BrainLat, and the combined dataset with 0.29 million parameters. Removing G1 or G3 caused the largest error increases. In the LoRA-rank ablation, rank 2 performed best while higher ranks did not improve performance. The best voting ensemble was MLP-25, with LoRA-MoE LM-25 ranked second and maintaining high specificity. LoRA-MoE generally outperformed MoE and MLP in the hidden-dimension ablation. WxFlow reproduced WRF power spectra broadly, while the bicubic baseline was oversmoothed with too little small-scale power. WxFlow outperformed the lapse-rate-corrected bicubic baseline in CRPS across evaluated tiles, especially in high-relief southeast Alaska. Across the full range of threshold probabilities where each tool provides positive net benefit, NEWS and PBR performance is similar for predicting mortality. The marginal performance gap between PBR and NEWS for critical illness p…