November 9, 2023
min read

Data-Informed Protocols: The Next Frontier in Clinical Trial Design & Conduct

In this article, we dive into the key benefits and implementation drivers of data-informed protocol development in clinical research.

Clinical trial protocols are at the heart of clinical research. They offer a blueprint for breakthrough innovation and play a pivotal role in bringing new medicine to market. A robust protocol design is crucial to both the scientific and executional feasibility of a clinical trial and thereby its probability of success.

Developing a robust study protocol can be challenging. Sponsors are often forced to make trade-offs, balancing scientific excellence and methodological rigour against practical considerations such as costs, timelines, and patient burden. This is further complicated by a significant rise in study complexity, resulting out of increasingly ambitious and personalised drug development strategies, growing demand for data to delineate between different patient subgroups, and the challenges related to locating, competing for, recruiting, and retaining sites and participants[1].

Historically, clinical trial protocols were crafted based on a combination of scientific literature, expert opinion, and regulatory guidelines, often leading to rigid and sometimes inefficient and impractical trial designs. The advent of data-informed protocol development marks a paradigm shift where sponsors and CROs leverage advanced data analytics, predictive modelling, and AI/Machine Learning to optimise and streamline their study designs. This method has been fuelled by the increasing availability of real-world data (RWD), including electronic health records (EHRs) and medical claims, as well as historical clinical trial data, including summary-level and patient-level data. Data-informed protocol development is characterised by adaptability, historic data reuse, and a more patient-centric and site-centric approach to study design and execution.

In this article, we dive into the key benefits and implementation drivers of data-informed protocol development. We discuss how historical data is successfully being used to guide and inform protocol development today to make clinical trials more adaptable, cost-efficient, and patient- and site-centric.

Zooming in on clinical trial protocol complexity

Let’s first set the scene. Clinical trial protocols have become increasingly complex over the past years, a trend that is expected to endure. Research studies by the Tufts Center for the Study of Drug Development (Tufts CSDD)—which has routinely conducted research on protocol design complexity over the past decade—show a continuing upward trend across both the scientific and executional complexity of new study protocols. A 2022 study on complexity benchmarks concluded that Phase 2 and Phase 3 protocols now average 20.7 and 18.6 endpoints; 30.9 and 30.4 inclusion and exclusion criteria; 107.6 and 115.9 protocol pages; 35.1 and 82.2 sites distributed across 6.1 and 13.7 countries; and 2.1 million and 3.5 million datapoints, respectively (see Table 1). Notably, Phase 3 clinical trials now collect 3 times as much data compared to 10 years ago[1].

What drives this decade-long increase in complexity and data-heaviness? According to Tufts CSDD, protocol complexity is most notably driven by the growing number of therapeutics in development targeting rare diseases, increasingly narrowly defined patient populations, and more complicated and logistically demanding execution models. These execution models are characterised by a growing number of countries, sites, technologies, and data sources (e.g., smartphones, wearables, etc.), further enabled by the rise of decentralised clinical trial designs [2].

The IQVIA Institute for Human Data Science confirms these findings. Their annual research report concludes that scientific complexity continued to rise across 2022 with 30% of global pipelines targeting rare diseases, 960 compounds in development classifying as next-generation biotherapeutics, and 62% of new drug approvals including first-in-class mechanisms[3]. Notably, the findings also highlighted growing executional complexity resulting from the rise of novel clinical trial designs, including umbrella, basket, master, and adaptive protocols, with 17% of new studies in 2022 including one or more aspects of novel trial designs[3].

The impact of protocol complexity and design flaws

Figure 1. The cause-effect-impact relationships of protocol complexity and design flaws. Protocol complexity and design flaws can lead to downstream effects such as protocol amendments, recruitment challenges, data integrity issues, and increased burden for patients and sites. These effects can significantly impact study timelines, leading to unexpected delays and unbudgeted costs.

Protocol complexity increases the likelihood of errors, including protocol design flaws that can lead to significant downstream issues such as protocol amendments, recruitment failure, and compromised data integrity. Not surprisingly, a recent study focusing on key bottlenecks in clinical trial setup and conduct identified protocol development as one of four critical path activities that most often causes study delays[4].

Similar to protocol complexity, the prevalence and mean number of protocol amendments has increased across all study phases since 2015. A recent study by Tufts CSDD shows that 75% of protocols require at least one substantial amendment, with the highest observed prevalence in Phase 2 (89%). Protocol amendments are highly disruptive to clinical trial conduct and represent the largest cause of unexpected delays and costs in a clinical trial[5]. Estimates are that biopharma companies spend between $7-8 billion annually to implement these amendments[6]. However, it is good to distinct between unavoidable amendments and avoidable amendments—with roughly 25% of substantial amendments being classified as avoidable. Unavoidable amendments can result out of new safety findings while avoidable amendment can result out of infeasible eligibility criteria, data integrity issues, or logistical impracticalities[6]. Let’s take a closer look at these three issues.

The restrictiveness of eligibility criteria is a major source of recruitment challenges for clinical trial sponsors, especially with increasingly narrowly targeted patient populations. Inclusion/exclusion (I/E) criteria that are too restrictive in relation to the available target patient population can significantly hamper recruitment timelines and can even lead to clinical trial failure. Sponsors are therefore often required to amend their I/E-criteria during trial execution, leading to significant study delay.

As to data integrity, a protocol that stimulates collection of excessive and unnecessary clinical data may compromise data integrity and analysis as it leads to higher error rates[7]. The more data a study collects, the greater the need for rigorous data management practices. Excessive data capture can thus introduce unnecessary complexity into data management processes, making it harder to ensure data quality and consistency.

Lastly, from a logistical perspective, if study participants are burdened with excessive procedures and site visits, they may become fatigued or frustrated, which could affect the accuracy of their responses or their willingness to continue participation. This can lower retention and lead to missing or biased data. The same applies to clinical trial sites, where demands for excessive data collection can harm site recruitment and retention as well as lead to inconsistencies across sites, further compromising data integrity.

Sponsors are thus wise to consider patient and site burden in their clinical trial designs as these can significantly impact recruitment and retention for both patients and sites, thereby impacting the cycle times, costs, and integrity of their study.

Data-informed protocol development offers game-changing potential

Data-informed protocol development offers an answer to protocol complexity and design flaws. It takes the guesswork out of protocol design and aims to boost the overall predictability and cost-efficiency of clinical trial conduct. This approach has become increasingly common as sponsors and CROs continue to invest in data science and technology, and as historic data grows increasingly rich and available.

To illustrate, the past decade has seen a sharp increase in the frequency and depth of clinical trial reporting as well as in the availability of RWD such as EHRs and medical claims data. Today, there’s a growing body of clinical trial intelligence and data aggregation tools that tap into these data sources, offering advanced data filtering and querying capabilities that allow sponsors to essentially benchmark any design aspect of their study—from objectives and endpoints to I/E criteria and recruitment timelines.

The downstream benefits of using these tools can be significant. Traditionally, protocol development relied heavily on the subjective experiences of key stakeholders such as the sponsor, CRO, and investigator, thereby often introducing a high level of uncertainty and unpredictability. By leveraging historical data, advanced analytics, and predictive modelling, sponsors can now increase certainty around the outcomes, timelines, and costs of their clinical development activities.

For example, by analysing data from historical clinical trials, sponsors can identify common causes of amendments and proactively address these in their protocol design. If a particular endpoint for instance has historically shown to be challenging to measure or irrelevant, it can be revised or omitted. Such critical design choices can significantly impact the feasibility of a study as well as the associated patient and site burden of undergoing and performing unnecessary procedures.

In addition, RWD sources such as EHRs can provide a clearer picture of the patient population being studied. Sponsors can use this data to evaluate the restrictiveness of their I/E criteria relative to the available patient population, thus helping to determine the feasibility of recruitment targets[8]. Moreover, they can assess which variables can cause the largest impact on recruitment and adapt accordingly.

Historical data can also guide and inform design aspects that affect data integrity such as determining the right sample size for a new study. A sample size that is too small can lead to inconclusive results, while a sample size that is too large wastes resources and introduces unnecessary burden. By examining past trials and the variability of their results, sponsors can estimate the necessary sample size to detect a meaningful difference in treatment effect and prevent under or overpowering their studies.  

Lastly, sponsors can analyse patient outcomes and behaviour from historical clinical trials to optimise the frequency and timing of their site visits. For example, historical data can reveal the most informative time points for assessing clinical outcomes as well as the time points and procedures where participants are most likely to drop out of a study. Using these insights, sponsors can optimise for retention, thereby ensuring more complete and consistent data.

Thoughts on implementation: Being data-driven vs. data-informed.

While the benefits of data-informed protocol development are manifold, implementation is not without its challenges. Data quality, sourcing, and management are all pivotal to the reliability and applicability of data insights. Moreover, it is important to note that data is not meant to replace stakeholder input and expertise, but rather complement and validate existing knowledge. It is for this reason that the term ‘data-informed’ is more appropriate in this context, compared to its more popular counterpart ‘data-driven’.

The term ‘data-driven’ suggests data is the primary driver of decision-making. It suggests a level of automation or algorithmic determination that not fully accounts for the complexity and nuance involved in protocol development. In contrast, the term ‘data-informed’ acknowledges the role of data as a critical input to the decision-making process, while also leaving room for the integration of expert judgement, ethical considerations, and contextual factors that are not easily quantifiable or present in the available data.

Data-informed protocol development therefore implies a more holistic approach where data is but one of several pillars supporting the protocol development process. It entails a process of continuous learning where data is integrated over time, reflecting an adaptive style that can be crucial in responding to new insights, unexpected outcomes, or changing conditions.

This adaptability is key given the fast-changing landscape of available tools and supporting regulations. A prominent example is the ongoing debate on the applicability of RWD in clinical trial decision-making. From a regulatory standpoint, authorities like the FDA have been moving towards emphasizing the value of RWD, but there’s currently still a lack of industry consensus on its extent of use due to concerns about data reliability and individual privacy.

So how can organisations adopt a more data-informed approach to clinical trial protocol development? Sponsors and CROs are advised to outline a clear implementation plan with defined programme goals before embedding any new tools in their processes. This plan should align with company strategy and should consider existing internal systems, IT infrastructure, and resources. It should cover key topics such as updating SOPs, vendor selection and evaluation, and how to incorporate feedback, expertise, and best practices across the implementation journey.  

Once implemented, data-informed insights can be incorporated across the protocol development lifecycle by the relevant teams and functions. Here, fostering a culture of short feedback cycles, continuous learning, and organisational adaptability can be a significant driver of success. For instance, a recent study suggests sponsors and CROs adopt Lean process management in combination with Agile teamwork to accelerate protocol development and thereby avoid potential delays and bottlenecks[4].

Introducing Triall CIX

Triall is building the Clinical Insights Exchange (CIX), a platform that enables analysis of historical clinical trial data to inform future research in its planning and design. The CIX platform applies novel privacy-preserving and -enhancing techniques such as Compute-to-Data and Self-Sovereign Identity (SSI) to allow analysis over aggregated data from clinical datasets and eClinical systems connected to the platform (more info). It therefore enables biopharma companies, clinical CROs, and medical research institutes to provide and consume clinical trial data without compromising data privacy or confidentiality. This allows these companies to generate data-informed insights that promote the speed, resource-efficiency, and predictability of their clinical development activities.

Future outlook: What’s next for data-informed protocol development?

Data-informed protocol development represents an exciting frontier in clinical trials. It taps into the increasing depth and availability of historical data and has been accelerated by recent advancements in data science and technology. While stakeholder input and expertise remain a hallmark of clinical trial design, we are likely to see a further shift towards incorporating data-informed insights across the protocol development lifecycle. This shift is enabled by an increasingly sophisticated landscape of digital tools that allow sponsors and CROs to derive meaningful insights out of historical clinical trial data and RWD sources. The advent of AI/Machine Learning models will only amplify this trend, by introducing a new level of predictive modelling and simulation that is likely to have a profound influence on how we design and conduct clinical trials.


  1. Getz, K., Smith, Z., & Kravet, M. (2023). Protocol design and performance benchmarks by phase and by oncology and rare disease subgroups. Therapeutic Innovation & Regulatory Science, 57(1), 49-56.
  2. Getz, K., Smith, Z., Botto, E., Murphy, E., & Dauchy, A. (2023). New Benchmarks on Protocol Amendment Practices, Trends and their Impact on Clinical Trial Performance. Therapeutic Innovation & Regulatory Science. Preprints.
  3. Aitken, M., Connelly, N., Kleinrock, M., Pritchett, J. (2023). Global Trends in R&D 2023. IQVIA Institute for Human Data Science.
  4. Bieske, L., Zinner, M., Dahlhausen, F., & Truebel, H. (2023). Critical path activities in clinical trial setup and conduct: How to avoid bottlenecks and accelerate clinical trials. Drug Discovery Today, 28(10), 103733.
  5. Tufts Center for the Study of Drug Development (2023). Prevalence and mean number of protocol amendments increasing across all phases. Tufts CSDD Impact Report, Volume 25, Number 2, March/April 2023.
  6. Getz, K. A., Stergiopoulos, S., Short, M., Surgeon, L., Krauss, R., Pretorius, S., ... & Dunn, D. (2016). The impact of protocol amendments on clinical trial performance and cost. Therapeutic innovation & regulatory science, 50(4), 436-441.
  7. Getz, K. A., & Campo, R. A. (2017). Trial watch: trends in clinical trial design complexity. Nature Reviews Drug Discovery, 16(5), 307-308.
  8. Fang, Y., Liu, H., Idnay, B., Ta, C., Marder, K., & Weng, C. (2023). A data-driven approach to optimizing clinical study eligibility criteria. Journal of Biomedical Informatics, 142, 104375.

Want to receive our latest insights directly in your inbox?

The Triall Newsletter provides a concise overview of product innovation updates, upcoming events, and anything else Triall-related.