Computational Research Scientist University of Texas Southwestern Medical Center Dallas, Texas, United States
Introduction/Rationale: Biological networks (or graphs) are generic data structures where nodes represent biological or other entities and each edge connects two nodes to represent a relationship between them. Examples include sequence similarity networks with nodes being sequences and edges being a similarity (distance) score between the two sequences, and receptor-epitope specificity networks with receptors and epitopes as nodes where an edge between them indicates the receptor has specificity for that epitope. Analysis of the structure and composition of these networks can provide insight into the underlying biology.
Methods: We built and analyzed a set of networks for human T cell receptor beta chain (TCRB) sequences in the Adaptive Immune Receptor Repertoire Knowledge Commons (AKC). Starting with ~357M TCRB junction AA sequences across all studies, subjects, diseases, and V/J genes, we acquired ~169M unique sequences and built a similarity network for edit distance equal to 1. The resultant similarity network has ~169M nodes and ~1.7B edges.
Results: Analysis of the degree distribution shows that the similarity network does not follow a power law or a log-normal distribution (i.e., is not scale-free). Connected component analysis of the similarity network produces ~32M components with ~31M of those components comprising a single isolated node. The single largest component contains ~131M nodes and, when excluding the isolated nodes, this component represents over 98% of the unique TCRB junction AA sequences in the AKC. This suggests the biological interpretation that, at the population level, many junction sequences are similar to each other.
Conclusion: We will analyze other large-scale networks built from the AKC and discuss their biological interpretation. In particular, we will incorporate epitopes with known receptor specificity and compare those receptor-epitope networks. Furthermore, we will partition and compare networks between subject demographics, diseases, and other study metadata.