Low-Rank Mixture of Experts

For one example configuration, LoRA-MoE produced an approximate 77.3% parameter reduction compared with standard experts. The article interprets LoRA-MoE as improving the trade-off between diagnostic accuracy, robustness, and computational…

1 sources - 6 claims

For one example configuration, LoRA-MoE produced an approximate 77.3% parameter reduction compared with standard experts. The article interprets LoRA-MoE as improving the trade-off between diagnostic accuracy, robustness, and computational efficiency. LoRA-MoE differs from standard MoE by representing expert weights as a shared base matrix plus a low-rank update. The proposed architecture combines a shared dense base network, a shallow gating network, and expert-specific low-rank LoRA adapters. Most experiments used Top-1 routing, so only the highest-scoring expert was activated. The low-rank constraint limits expert-specific updates to a low-dimensional subspace.