November 9, 2023
min read

Data-Informed Protocols: The Next Frontier in Clinical Trial Design & Conduct

In this article, we dive into the key benefits and implementation drivers of data-informed protocol development in clinical research.

Article summary / TLDR

1. Clinical trials have seen an upward trend in scientific and executional protocol complexity over the past decade.

2. Protocol complexity increases the risk of study design flaws, which can lead to downstream issues such as amendments, recruitment failure, and data integrity issues, as well as greater patient and site burden.

3. Data-informed protocol development addresses this issue by leveraging historical data, advanced analytics, and predictive modelling to optimise study designs, predictability, and cost-efficiency.

4. This method drives an adaptive and agile decision-making style that combines data and expert input to determine the best course of action.

Clinical trial protocols are at the heart of clinical research and play a pivotal role in bringing new medicine to market. A robust protocol design is crucial to both the scientific and executional feasibility of a clinical trial and thereby its probability of success.

However, developing a robust study protocol can be challenging. Sponsors are often forced to make trade-offs, balancing scientific excellence and methodological rigour against practical considerations such as costs, timelines, and patient burden. This balancing act has become even more complicated in recent years due to increasingly ambitious and personalised drug development strategies, growing demand for data to delineate between different patient subgroups, and the challenges related to locating, competing for, recruiting, and retaining sites and participants[1].

Historically, clinical trial protocols were crafted based on a combination of scientific literature, expert opinion, and regulatory guidelines, often leading to rigid and sometimes inefficient and impractical trial designs. Today, sponsors and CROs are increasingly leveraging advanced data analytics, predictive modelling, and AI/Machine Learning to optimise and streamline their study designs—a method that we refer to as ‘data-informed protocol development’.

This shift has been fuelled by the increasing availability of real-world data (RWD), including electronic health records (EHRs) and medical claims, as well as historical clinical trial data, including summary-level and patient-level data. Importantly, data-informed protocol development is characterised by adaptability, historic data reuse, and a more patient-centric and site-centric approach to study design and execution.

In this article, we explore the key benefits and implementation drivers of data-informed protocol development. We discuss how historical data is used to guide and inform protocol development today, making clinical trials more adaptable, cost-efficient, and patient- and site-centric.

Triall is building a platform that enables analysis of historical clinical trial data to inform future research in its planning and design. Reach out to our team if you'd like learn more and get involved in shaping the final product.

Zooming in on clinical trial protocol complexity

Let’s first set the scene. Clinical trial protocols have become increasingly complex over the years, a trend that is expected to endure. The Tufts Center for the Study of Drug Development (Tufts CSDD)—which has routinely conducted research on protocol design complexity over the past decade—reports a continuing upward trend across both the scientific and executional complexity of new study protocols. Their 2022 study on complexity benchmarks concluded that Phase 2 and Phase 3 protocols now average 20.7 and 18.6 endpoints; 30.9 and 30.4 inclusion and exclusion criteria; 107.6 and 115.9 protocol pages; 35.1 and 82.2 sites distributed across 6.1 and 13.7 countries; and 2.1 million and 3.5 million datapoints, respectively (see Table 1). Remarkably, Phase 3 clinical trials now collect 3 times as much data compared to 10 years ago[1].

What drives this decade-long increase in complexity and data volume? According to Tufts CSDD, protocol complexity is most notably driven by an increase in therapeutics targeting rare diseases, more narrowly defined patient populations, and more complicated and logistically demanding execution models. These execution models are characterised by a growing number of countries, sites, technologies, and data sources (e.g., smartphones, wearables, etc.), further enabled by the rise of decentralised clinical trial designs[2].

These findings are confirmed by the IQVIA Institute for Human Data Science. Their annual research report concludes that scientific complexity continued to rise across 2022 with 30% of global pipelines targeting rare diseases, 960 compounds in development classifying as next-generation biotherapeutics, and 62% of new drug approvals including first-in-class mechanisms[3]. Additionally, the report underscores growing executional complexity due to the rise of novel clinical trial designs, including umbrella, basket, master, and adaptive protocols, with 17% of new studies in 2022 including one or more aspects of novel trial designs[3].

The impact of protocol complexity and design flaws

Figure 1. The cause-effect-impact relationships of protocol complexity and design flaws. Protocol complexity and design flaws can lead to downstream effects such as protocol amendments, recruitment challenges, data integrity issues, and increased burden for patients and sites. These effects can significantly impact study timelines, leading to unexpected delays and unbudgeted costs.
Figure 1. The cause-effect-impact relationships of protocol complexity and design flaws. Protocol complexity and design flaws can lead to downstream effects such as protocol amendments, recruitment challenges, data integrity issues, and increased burden for patients and sites. These effects can significantly impact study timelines, leading to unexpected delays and unbudgeted costs.

Protocol complexity increases the risk of errors and design flaws that can lead to significant downstream issues such as protocol amendments, recruitment failure, and compromised data integrity. Not surprisingly, a recent study focusing on key bottlenecks in clinical trial setup and conduct identified protocol development as one of four critical path activities that most often causes study delays[4].

Similar to protocol complexity, the prevalence and mean number of protocol amendments has increased across all study phases since 2015. A recent Tufts CSDD study reveals that 75% of protocols require at least one substantial amendment, with the highest observed prevalence in Phase 2 (89%). Protocol amendments are highly disruptive to clinical trial conduct and represent the largest cause of unexpected trial delays and costs[5], with  biopharma companies spending between $7-8 billion annually to implement amendments[6]. Here, it’s crucial to distinct between unavoidable amendments and avoidable amendments—with roughly 25% of substantial amendments being classified as avoidable. Unavoidable amendments may arise from new safety findings, whereas avoidable amendment often stem from infeasible eligibility criteria, data integrity issues, or logistical impracticalities[6]. Now, let’s take a closer look at these three issues.

The restrictiveness of eligibility criteria is a major source of recruitment challenges for clinical trial sponsors, especially with increasingly narrowly targeted patient populations. Inclusion/exclusion (I/E) criteria that are too restrictive in relation to the available target patient population can significantly hamper recruitment timelines and can even lead to clinical trial failure. Consequently, sponsors are often required to amend their I/E-criteria in the midst of the study, leading to significant delays.

As to data integrity, a protocol that stimulates collection of excessive and unnecessary clinical data can undermine data integrity and analysis as it leads to higher error rates[7]. The more data a study collects, the greater the need for rigorous data management practices. Excessive data capture can therefore introduce unnecessary complexity into data management processes, making it harder to ensure data quality and consistency.

Lastly, from a logistical perspective, if study participants are burdened with excessive procedures and site visits, they may become fatigued or frustrated. This can affect the accuracy of their responses or their willingness to continue participation, lowering retention and risking data loss or bias. The same applies to clinical trial sites, where demands for excessive data collection can harm site recruitment and retention as well as lead to inconsistencies across sites, further compromising data integrity.

Sponsors are thus wise to consider patient and site burden in their clinical trial designs as these can significantly impact recruitment, engagement, and retention for both patients and sites—thereby impacting the cycle times, costs, and integrity of their study.‍

Data-informed protocol development offers game-changing potential

Data-informed protocol development offers an answer to protocol complexity and design flaws. It takes the guesswork out of protocol design and aims to boost the overall predictability and cost-efficiency of clinical trial conduct. This method has gained traction as the industry continues to invest in data science and technology and as historic data grows increasingly rich and available.

To illustrate, the past decade has seen a sharp increase in the frequency and depth of clinical trial reporting as well as in the availability of RWD such as EHRs and medical claims data. Today, there’s a growing body of clinical trial intelligence and data aggregation tools that tap into these data sources, offering advanced data filtering and querying capabilities that allow sponsors to essentially benchmark any design aspect of their study—from objectives and endpoints to I/E criteria and recruitment timelines.

The downstream benefits of using these tools can be significant. Traditionally, protocol development relied heavily on the subjective experiences of key stakeholders such as the sponsor, CRO, and investigator, thereby often introducing a high level of uncertainty and unpredictability. Now, by utilising historical data, advanced analytics, and predictive modelling, sponsors can increase certainty around the outcomes, timelines, and costs of their clinical development activities.

For instance, by analysing data from historical clinical trials, sponsors can identify common causes of amendments and proactively address these in their protocol design. If a particular endpoint for instance has historically shown to be challenging to measure or irrelevant, it can be revised or omitted. Such critical design choices can significantly impact a study’s feasibility and can reduce unnecessary burden on patients and sites.

Moreover, RWD sources such as EHRs can provide a clearer picture of the patient population being studied. Sponsors can use this data to evaluate the restrictiveness of their I/E criteria relative to the available patient population, thus helping to determine the feasibility of recruitment targets[8]. Furthermore, they can assess which variables can cause the largest impact on recruitment and adapt accordingly.

Historical data can also guide and inform design aspects that affect data integrity such as determining the right sample size for a new study. A sample size that is too small can lead to inconclusive results, whereas a sample size that is too large wastes resources and imposes unnecessary burden. By reviewing past trials and their result variability, sponsors can estimate the optimal sample size required to detect a meaningful difference in treatment effect, thereby preventing under- or overpowering their studies.

Lastly, by analysing patient outcomes and behaviors from historical clinical trials, sponsors can fine-tune the frequency and timing of their site visits. Here, historical data can reveal the most informative time points for assessing clinical outcomes as well as the time points and procedures where participants are most likely to drop out of a study. Leveraging these insights, sponsors can optimise for retention, thereby ensuring more complete and consistent data.

Handling implementation: Being data-driven vs. data-informed.

Data-driven vs. data-informed clinical research
Figure 2. Being data-driven vs. being data-informed.

While the benefits of data-informed protocol development are manifold, implementation is not without its challenges. Data quality, sourcing, and management are all pivotal to the reliability and applicability of data-derived insights. Importantly, data insights are not meant to replace stakeholder input and expertise, but rather complement and validate existing knowledge. The term ‘data-informed’ is therefore more appropriate than its more popular counterpart ‘data-driven’.

The term ‘data-driven’ implies data is the primary driver of decision-making, suggesting a level of automation or algorithmic determination that not fully accounts for the complexity and nuance involved in protocol development. In contrast, the term ‘data-informed’ acknowledges the role of data as a critical input to the decision-making process, while also leaving room for the integration of expert judgement, ethical considerations, and contextual factors that are not easily quantifiable or present in the available data.

Data-informed protocol development therefore suggests a more holistic approach, with data being one of several pillars that support the protocol development process. This approach involves continuous learning where data is integrated over time, embodying an adaptive style essential for responding to new insights, unexpected outcomes, or changing conditions.

This adaptability is key given the fast-changing landscape of available tools and supporting regulations. A prominent example is the ongoing debate on the applicability of RWD in clinical trial decision-making. From a regulatory standpoint, authorities like the FDA have been moving towards emphasising the value of RWD, but there’s currently still a lack of industry consensus on its extent of use due to concerns about data reliability and individual privacy.

How can organisations adopt a more data-informed approach to protocol development? Sponsors and CROs are advised to outline a clear implementation plan with defined programme goals before embedding any new tools in their processes. This plan should align with the company’s strategy and should consider existing internal systems, IT infrastructure, and resources. It should cover key topics such as updating SOPs, vendor selection and evaluation, and how to incorporate feedback, expertise, and best practices across the implementation journey.

Once implemented, data-informed insights can be incorporated across the protocol development lifecycle by the relevant teams and functions. Here, fostering a culture of short feedback cycles, continuous learning, and organisational adaptability can be a significant driver of success. For instance, a recent study suggests sponsors and CROs adopt Lean process management in combination with Agile teamwork to accelerate protocol development and thereby avoid potential delays and bottlenecks[4].

Introducing Triall CIX

Triall is building the Clinical Insights Exchange (CIX), a platform that enables analysis of historical clinical trial data to inform future research in its planning and design. The CIX platform applies novel privacy-preserving and -enhancing techniques such as Compute-to-Data and Self-Sovereign Identity (SSI) to allow analysis over aggregated data from clinical datasets and eClinical systems connected to the platform (more info). It therefore enables biopharma companies, clinical CROs, and medical research institutes to provide and consume clinical trial data without compromising data privacy or confidentiality. This allows these companies to generate data-informed insights that promote the speed, resource-efficiency, and predictability of their clinical development activities.

Sponsors and CRO interested in learning more about data-informed protocol development and how to best utilise historical clinical trial data can contact our team here.

Screenshot impression of Triall's Clinical Insights Exchange (CIX), a platform that enables analysis of historical clinical trial data to inform future research in its planning and design.
Figure 3. Screenshot impression of the Clinical Insights Exchange (CIX).

Future outlook: What’s next for data-informed protocol development?

Data-informed protocol development represents an exciting frontier in clinical trials. It taps into the increasing depth and availability of historical data and has been accelerated by recent advancements in data science and technology. Although stakeholder input and expertise continue to play a critical role in trial design, we will likely see a further shift towards incorporating data insights across the protocol development lifecycle. This shift is enabled by an increasingly sophisticated landscape of digital tools that allow sponsors and CROs to derive meaningful insights out of historical clinical trial data and RWD sources. Moreover, AI/Machine Learning models will only amplify this trend by introducing new levels of predictive modelling and simulation that are likely to have a profound influence on how we design and conduct clinical trials.


  1. Getz, K., Smith, Z., & Kravet, M. (2023). Protocol design and performance benchmarks by phase and by oncology and rare disease subgroups. Therapeutic Innovation & Regulatory Science, 57(1), 49-56.
  2. Getz, K., Smith, Z., Botto, E., Murphy, E., & Dauchy, A. (2023). New Benchmarks on Protocol Amendment Practices, Trends and their Impact on Clinical Trial Performance. Therapeutic Innovation & Regulatory Science. Preprints.
  3. Aitken, M., Connelly, N., Kleinrock, M., Pritchett, J. (2023). Global Trends in R&D 2023. IQVIA Institute for Human Data Science.
  4. Bieske, L., Zinner, M., Dahlhausen, F., & Truebel, H. (2023). Critical path activities in clinical trial setup and conduct: How to avoid bottlenecks and accelerate clinical trials. Drug Discovery Today, 28(10), 103733.
  5. Tufts Center for the Study of Drug Development (2023). Prevalence and mean number of protocol amendments increasing across all phases. Tufts CSDD Impact Report, Volume 25, Number 2, March/April 2023.
  6. Getz, K. A., Stergiopoulos, S., Short, M., Surgeon, L., Krauss, R., Pretorius, S., ... & Dunn, D. (2016). The impact of protocol amendments on clinical trial performance and cost. Therapeutic innovation & regulatory science, 50(4), 436-441.
  7. Getz, K. A., & Campo, R. A. (2017). Trial watch: trends in clinical trial design complexity. Nature Reviews Drug Discovery, 16(5), 307-308.
  8. Fang, Y., Liu, H., Idnay, B., Ta, C., Marder, K., & Weng, C. (2023). A data-driven approach to optimizing clinical study eligibility criteria. Journal of Biomedical Informatics, 142, 104375.

Want to receive our latest insights directly in your inbox?

The Triall Newsletter provides a concise overview of product innovation updates, upcoming events, and anything else Triall-related.