Next-generation sequencing (NGS) has revolutionized antibody discovery by enabling high-depth analysis of diverse libraries, surpassing the limitations of traditional methods like Sanger sequencing. This article explores NGS’s role in validating antibody libraries, enhancing screening processes, and integrating with artificial intelligence (AI) and machine learning (ML) to predict and engineer superior binders.
Drawing from examples such as Isogenica’s use of NGS for codon-level validation and rare binder identification, we highlight how NGS measures enrichment across selection rounds, identifies crossover binders, and removes background noise. Comparisons of NGS platforms—Illumina for high-accuracy short reads, Oxford Nanopore for long reads, and PacBio HiFi for high-fidelity long reads—underscore their applications in antibody sequencing. Challenges include bioinformatics bottlenecks and the need for larger datasets to fuel AI models. Ultimately, combining NGS with technologies like CIS Display™ positions antibody discovery for faster, more precise outcomes, empowering biotechnology to address complex therapeutic targets.
Next-generation sequencing (NGS) has become a cornerstone in modern biotechnology, providing unprecedented insights into genetic and proteomic diversity. In antibody discovery, NGS plays a critical role in validating and improving antibody libraries, which are essential for identifying high-affinity binders. Standard practices involve using high-depth sequencing to assess library quality, ensuring diversity and functionality before screening. Beyond validation, NGS facilitates the discovery process by analyzing enriched populations post-selection, revealing sequences that bind to targets with high specificity or more rare binders not identified during standard screening.
Moreover, NGS is instrumental in linking sequence data with functional outcomes, a key challenge in the field. This integration is particularly valuable for feeding machine learning (ML) algorithms, where vast datasets of sequences can train models to predict binding affinities, epitopes, and other properties. However, one of the biggest hurdles remains correlating sequence variations with functional data, such as binding strength or stability, to enable predictive modeling.
Traditional methods like Sanger sequencing have long been used in antibody discovery but come with significant limitations. Sanger sequencing offers low throughput, typically processing only a few hundred sequences per run, making it inadequate for analyzing large, diverse antibody libraries. The high cost associated with scaling up Sanger for comprehensive library coverage further restricts its utility.
Additionally, Sanger lacks the resolution needed for quality control (QC) of highly diverse libraries, often missing rare variants or subtle biases in codon usage. These constraints have driven the shift toward more advanced technologies capable of handling the complexity of modern antibody repertoires.
NGS is pivotal for library quality validation, as demonstrated by Frigotto et al. (2015) from Isogenica, who used NGS to perform codon-level validation of synthetic antibody fragment libraries before screening. Their approach involved automated hexamer codon additions to build precise libraries, validated through NGS to ensure diversity and accuracy. This pre-screening step helps identify any biases or errors in library construction.
Isogenica employs NGS to cut through noise in challenging targets, such as cell-surface proteins or nanodisc-embedded membrane proteins, where conventional screens may fail. It excels at identifying rare binders that traditional methods overlook, enhancing the discovery of novel antibodies and keeping the funnel of potential valuable molecules wider for longer.
Figure 1: Tracking sequence abundance between rounds highlights which binders are enriched under selective pressure. This data-driven view of enrichment informs refinement of panning conditions and accelerates the identification of more relevant binders.
In practice, NGS measures enrichment between selection rounds to evaluate and refine panning conditions, ensuring optimal binder selection. It also identifies crossover binders from different panning arms or selection streams, providing insights into shared epitopes. Furthermore, NGS corrects incomplete deselection by removing background noise from deselection streams, improving the signal-to-noise ratio in datasets and enabling informed inclusion of families of binders.
Figure 2: NGS enables direct comparison between different selection arms, revealing crossover sequences that are common across multiple streams. These insights help validate selection strategies and identify highly robust candidates that traditional methods might miss.
Emerging trends in antibody discovery integrate NGS with AI and ML for advanced applications. For instance, Charlotte Deane’s group at the University of Oxford, UK, has made numerous databases and algorithms available to drive forward AI for antibody discovery in academia and the private sector alike. AI-informed affinity maturation, as explored by specialist providers such as Nabla Bio, uses computational models to optimize antibody binding. Similarly, several tools are available to predict binding-relevant antibody regions, such as paratopes and epitopes, with implications for intellectual property in therapeutic design.
Companies like Sortera are expanding structure/function datasets through their Deep Screening technology, generating millions of sequence-function pairs to train AI models for biologics discovery. This addresses data scarcity, enabling predictive engineering of antibodies with desired properties.
CIS Display™, a cell-free in vitro display technology, complements this by producing large datasets for AI-driven prediction and engineering. By displaying ultra-diverse libraries (up to 1014), it facilitates high-throughput screening and NGS analysis, powering ML algorithms to forecast binder performance. With this system, discrete changes in conditions and inputs can be accurately associated with both positive and negative date to support better algorithm training and predictions at the sequence level.
Figure 3: NGS analysis groups the top 100 enriched sequences into distinct clusters, illustrating diversity within the most enriched binders. This clustering provides a foundation for AI-driven modelling by linking sequence families to potential functional similarities.
Different NGS platforms offer unique advantages for antibody sequencing. Illumina provides short reads with high accuracy, ideal for VHH domains under 350 bp. This makes it suitable for high-throughput, error-sensitive applications.
Oxford Nanopore excels in long reads, portability, and real-time sequencing, enabling analysis of full-length antibody genes without assembly artifacts, though with higher error rates compared to Illumina.
PacBio HiFi combines high-fidelity with long reads, offering superior accuracy for complex repertoires, making it valuable for resolving structural variations in antibody libraries.
Despite its advantages, NGS in antibody discovery faces bioinformatics bottlenecks in interpreting vast datasets, requiring advanced tools for error correction and diversity analysis. Integrating NGS with AI-driven engineering poses challenges in predicting multiple properties like affinity, specificity, and manufacturability simultaneously, not to mention synthesizing and testing large numbers of antibody sequences for functional validation.
Building larger, high-quality datasets remains essential to power predictive AI/ML models, with concerns over data bias and accessibility if controlled by commercial entities.
The future outlook is promising, with collaborative data-sharing initiatives and interpretable AI techniques expected to enhance rational antibody design. Synergies between NGS, high-throughput experimentation, and ML will accelerate discovery, addressing global demands for therapeutic antibodies.
NGS empowers Isogenica to enhance library quality, discover rare binders, and continually refine selection strategies, leading to more effective antibody development. When combined with AI/ML and CIS Display™, it transforms workflows into faster, more precise, and data-rich processes, paving the way for innovative therapeutics.
Clark, L. A., et al. (2023). Affinity maturation of antibodies by combinatorial codon mutagenesis versus error-prone PCR-derived mutagenesis. BMC Biotechnology, 23(1): 55.
Frigotto, L., et al. (2015). Codon-Precise, Synthetic, Antibody Fragment Libraries Built Using Automated Hexamer Codon Additions and Validated through Next Gen- eration Sequencing. Antibodies, 4(2): 88-102.
Makowski, E. K., et al. (2024). Optimization of therapeutic antibodies for reduced self-association in high-concentration formulations. Protein Engineering, Design and Selection, 37: gzae005.
Rouet, R., et al. (2018). Next-generation sequencing of antibody display repertoires. Frontiers in Immunology, 9: 118.
Biologic therapies have reshaped oncology, but they come with real limitations. Traditional monoclonal antibodies are large, complex molecules that can struggle with tumor penetration, slow development timelines, and costly manufacturing. In a field where time is critical and healthcare budgets are under increasing pressure, these constraints can hold back promising innovations. Read the article.
Conventional immunization-based discovery techniques frequently result in overlapping sequences with intellectual property (IP) concerns, due to the high number of antibody sequences already patented. These limitations can restrict opportunities for development, increase legal uncertainty, and reduce commercial viability.
This case study delves into the development of anti-LRP6 VHH inhibitors that overcome these challenges. Isogenica’s VHH technology offers hope for more effective therapies targeting Wnt-related cancers.
This case study delves into the development of anti-LRP6 VHH inhibitors that overcome these challenges. Isogenica’s VHH technology offers hope for more effective therapies targeting Wnt-related cancers.