Pretraining

2 sources - 12 claims

Ablations showed that both pre-training and the diversity term improved performance. Unlike standard masked autoencoders, the design does not let visible-patch information bypass the bottleneck. A context-target attention mask forced conditional imputation from observed proteins by preventing target proteins from attending to other target proteins except themselves. The decoder reconstructs masked input patches only from Fingerprint Tokens and positional mask tokens. Training jointly optimized expression self-decoder, global expression decoder, and joint decoder losses. Pretraining simulated targeted proteomics panel expansion by splitting each cell's proteins into visible context and target imputation sets. Raw expression matrices were transformed with log(1 + x) or arcsinh and min-max normalized to the continuous range from 0 to 10. Protein symbols were standardized to canonical UniProt identifiers, and canonical amino-acid sequences were retrieved from UniProt. The experiments used frequency masking with a masking ratio of 0.6. Excessive masking harmed performance for some fingerprint settings by removing local diagnostic cues. The pretraining corpus contained 391,280,332 human…