Diffusion Transformers

Diffusion noise is interpreted as data augmentation that slows both learning and memorization compared with GPT. Per-sample diffusion transitions from rule-violating to rule-valid states were sharp and synchronized across seeds. DiT sampli…

1 sources - 6 claims

Diffusion noise is interpreted as data augmentation that slows both learning and memorization compared with GPT. Per-sample diffusion transitions from rule-violating to rule-valid states were sharp and synchronized across seeds. DiT sampling used a deterministic second-order Heun sampler with 35 steps. DiT G=6 rule learning was sensitive to learning rate. The primary diffusion model used EDM continuous-time Gaussian diffusion with a Diffusion Transformer.