Retrofitting
Applying the mixture objective post hoc to a pretrained Llama-3.2-1B checkpoint failed in the reported experiment. The article attributes retrofit failure to dense pretrained backbones lacking useful intermediate representations for early…
1 sources - 3 claims
Applying the mixture objective post hoc to a pretrained Llama-3.2-1B checkpoint failed in the reported experiment. The article attributes retrofit failure to dense pretrained backbones lacking useful intermediate representations for early exits. In the retrofit experiment, early exit probabilities collapsed toward zero, making the model effectively dense.