Associate Professor Yale University New Haven, Connecticut, United States
Introduction/Rationale: Predicting which viral strains will become predominant is challenging for vaccine design and therapeutic development. Viral evolution is driven by interactions between sequence variation, molecular structure, receptor binding, and immune escape, yet most computational methods consider these factors in isolation. A unified representation integrating them is needed to better anticipate viral evolutionary dynamics. We introduce ViSENet (Viral Sequence Evolution Network), a multimodal framework that jointly embeds viral spike protein sequence and structure in a chronologically organized latent space.
Methods: ViSENet integrates complementary neural architectures. Spike protein sequences are encoded using a transformer-based encoder that learns temporally organized embeddings. Structural information is captured using a geometric scattering encoder applied to AlphaFold-predicted structures, extracting multiscale features of key spike domains. Sequence and structure embeddings are fused into a shared latent space and trained with supervision from sequence reconstruction, emergence time, receptor binding affinity, and immune escape. Evolutionary dynamics are modeled using a neural ODE to enable continuous time forecasting.
Results: Applied to COVID-19 and influenza datasets, ViSENet learns latent representations that organize viral variants by temporal emergence and lineage, with related strains clustering by sequence similarity. The model accurately predicts binding affinity and outperforms unimodal baselines. Time split evaluations show that latent trajectories capture meaningful evolutionary trends, and the neural ODE enables projection of viral evolution several weeks into the future.
Conclusion: ViSENet provides a unified framework for modeling viral evolution by integrating sequence, structure, and functional properties within a temporally organized latent space, enabling interpretation of past trends and forecasting of emergent viral variants for anticipatory vaccine development.