Redundancy-Constrained Information Maximization
The full training loss combines reconstruction and diversity terms with lambda set to 1e-4 in experiments. The paper argues that minimizing the composite loss implements redundancy-constrained information maximization. The theoretical obje…
1 sources - 5 claims
The full training loss combines reconstruction and diversity terms with lambda set to 1e-4 in experiments. The paper argues that minimizing the composite loss implements redundancy-constrained information maximization. The theoretical objective maximizes mutual information between input signals and tokens while penalizing total correlation among tokens. The diversity loss uses Total Coding Rate to expand latent volume and encourage orthogonality. Reconstruction loss is used to maximize a variational lower bound on input-token mutual information.