scpFormer
Each protein token combines an ESM-derived semantic identity embedding with a continuous expression-value embedding. scpFormer replaces index-based protein tokens with continuous sequence-anchored protein identity embeddings and continuous…
1 sources - 6 claims
Each protein token combines an ESM-derived semantic identity embedding with a continuous expression-value embedding. scpFormer replaces index-based protein tokens with continuous sequence-anchored protein identity embeddings and continuous expression-value embeddings. scpFormer uses a learnable classification token and a transformer encoder to produce contextualized protein embeddings plus a global cell embedding. The model uses ESM-derived protein embeddings to position proteins by structural and functional similarity, enabling unseen proteins to be incorporated without retraining a discrete vocabulary. scpFormer is introduced as a foundation model for single-cell proteomics where no comparable foundation model had been established.