Postdoctoral Fellow La Jolla Institute for Immunology La Jolla, California, United States
Introduction/Rationale: The Immune Epitope Database (IEDB, iedb.org) is a freely available resource that catalogs experimentally defined immune epitopes. Concurrently, the IEDB records ~190,000 T cell receptors and ~5,000 antibodies with experimentally verified epitope specificity. Because these receptors have been manually curated from 3,300 references spanning decades, reported data and nomenclature can be inconsistent, posing challenges for computational analyses. To support interoperability and integration with community resources such as the Adaptive Immune Receptor Repertoire Knowledge Commons (AKC), we are revising all immune receptor records to produce resolved, standardized, and analysis-ready receptor data.
Methods: We developed a computational pipeline that employs IgBLAST for V/D/J gene assignment, ANARCII for identification of Complementarity Determining Regions (CDRs), and tidytcells to standardize author-reported gene names. We furthermore extended tidytcells to validate and standardize CDR3 sequences based on reported V/J gene usage and to support antibody data. Crucially, the pipeline also flags anomalous data for targeted re-curation by expert curators.
Results: The reprocessed receptor dataset contains V/D/J gene names that are correctly formatted and mapped to existing reference genes, and CDR3 sequences are consistently represented up to their conserved anchor residues. Improved anomaly detection allowed us to identify and correct anomalous receptor records from hundreds of studies.
Conclusion: These revisions increase data quality and improve interoperability, as exemplified by integration with the AKC. This integration will enable researchers to seamlessly query large-scale repertoires for receptors with experimentally verified specificity in the IEDB, link orphan sequences to known targets, and support cross-repository studies of receptor-epitope pairs and their relationship to health and disease.