N-vium

A standard transformer is a special case of N-vium when all probability mass is placed on the final head. Routers and adapters add modest parameter overhead in the reported experiments. N-vium formulates a decoder transformer as a trained…

1 sources - 4 claims

A standard transformer is a special case of N-vium when all probability mass is placed on the final head. Routers and adapters add modest parameter overhead in the reported experiments. N-vium formulates a decoder transformer as a trained mixture over depth exits rather than as intermediate approximations to the final layer. N-vium augments a decoder-only transformer with multiple exits positioned at equal layer-block junctions.