Graduate Student Scripps Research Skaggs Graduate School of Chemical and Biological Sciences, United States
Disclosure(s):
Karenna Ng: No financial relationships to disclose
Introduction/Rationale: Somatic hypermutation (SHM) involves activation-induced cytidine deaminase (AID)-mediated DNA targeting, followed by error-prone repair which introduces mutations. However, existing computational models do not reflect this, limiting their ability to model the mechanism of affinity maturation in silico. Here, we develop a biologically grounded, transformer-based framework that mirrors SHM. Our framework employs two independent antibody language models (AbLMs): an AID-like targeting model to select mutation sites and a DNA repair-like substitution model to predict resulting amino acids.
Methods: We sequenced memory B cell repertoires from eight healthy donors using high-accuracy bulk NGS of unpaired VH/VL chains. These data were used for pretraining the two AbLMs to perform in silico SHM. Using AntiBERTa2-derived metrics as a correlate for native-like antibodies, we compared model-generated sequences to the natural human immune repertoire.
Results: Bulk NGS produced ~84 million high-quality, productive memory B cell receptor sequences for AbLM training. Model predicted mutations matched the spatial distribution and overall load of true SHM. AntiBERTa2 embedding projections and log likelihood scoring revealed that model-generated sequences closely resemble native antibodies and strongly mimic SHM mutational patterns. Model-generated sequences also exhibited significantly increased levels of expression in HEK293F cells, indicating learning of beneficial mutations that enhance protein fitness.
Conclusion: Our framework models SHM to generate native-like human antibodies, providing a biologically grounded tool for guiding design within the natural sequence space. Unlike motif-based or single-stage SHM simulators, our framework explicitly decouples targeting and substitution to enable faithful reproduction of both hotspot localization and amino acid substitution. This approach advances in silico modeling of humoral immunity and may reveal new mechanistic insights into antibody clonal dynamics.