Temporal Masked Autoencoder Pretraining

1 sources - 5 claims

T-MAE was the largest contributor in component ablation. Controlled pretraining experiments found MP-IB improved from rho 0.034 without T-MAE to 0.117 with T-MAE. The T-MAE decoder reconstructs masked log-Mel patches during pretraining and is discarded before fine-tuning. T-MAE uses 75% time-frequency masking over random 16x16 spectrogram patches. T-MAE pretraining is used to improve generalization when labeled data are scarce.