Background:As described in ICH E3 Q&A R1 (International Council for Harmonisation. E3: Structure and content of clinical study reports—questions and answers (R1). 6 July 2012. Available from: https://database.ich.org/sites/default/files/E3_Q%26As_R1_Q%26As.pdf): “A protocol deviation (PD) is any change, divergence, or departure from the study design or procedures defined in the protocol”. A problematic area in human subject protection is the wide divergence among institutions, sponsors, investigators and IRBs regarding the definition of and the procedures for reviewing PDs. Despite industry initiatives like TransCelerate’s holistic approach [Galuchie et al. in Ther Innov Regul Sci 55:733–742, 2021], systematic trending and identification of impactful PDs remains limited. Traditional Natural Language Processing (NLP) methods are often cumbersome to implement, requiring extensive feature engineering and model tuning. However, the rise of Large Language Models (LLMs) has revolutionised text classification, enabling more accurate, nuanced, and context-aware solutions [Nguyen P. Test classification in the age of LLMs. 2024. Available from: https://blog.redsift.com/author/phong/]. An automated classification solution that enables efficient, flexible, and targeted PD classification is currently lacking.
Methods:We developed a novel approach using a large language model (LLM), Meta Llama2 [Meta. Llama 2: Open source, free for research and commercial use. 2023. Available from: https://www.llama.com/llama2/] with a tailored prompt to classify free-text PDs from Roches’ PD management system. The model outputs were analysed to identify trends and assess risks across clinical programs, supporting human decision-making. This method offers a generalisable framework for developing prompts and integrating data to address similar challenges in clinical development.
Result:This approach flagged over 80% of PDs potentially affecting disease progression assessment, enabling expert review. Compared to months of manual analysis, this automated method produced actionable insights in minutes. The solution also highlighted gaps in first-line controls, supporting process improvement and better accuracy in disease progression handling during trials.