Webinar Host Head of Statistics nQuery Lead Researcher FDA Guest Speaker Guest Lecturer HOSTED BY: Ronan Fitzpatrick Webinar Overview Introduction to Adaptive Design Adaptive Design Regulatory Context Sample Size Re-estimation & Worked Example Discussion and Conclusions Worked Examples

Overview Two Means Group Sequential Trial Unblinded SSR Example WORKED EXAMPLES Two Means Conditional Power Two Means Group Sequential Trial Two Means Blinded SSR Unblinded SSR Example Two Means Conditional Power Two Means Blinded SSR

About nQuery In 2017, 90% of organizations with clinical trials approved by the FDA used nQuery for sample size and power calculation PAR T1 Adaptive Design Background Context SSD finds the appropriate sample size for your study Common metrics are statistical power, interval

width or cost SSD seeks to balance ethical and practical issues Crucial to arrive at valid conclusions, Type M/S errors High cost of failed clinical trials drug Adaptive Trials Overview Adaptive Trials are any trial where a change or decision is made to a trial while still on-going Encompasses a wide variety of potential adaptions e.g. Early stopping, SSR, enrichment, seamless, Adaptive trials seek to give control to dose-finding

trialist to improve trial based on all available information Adaptive trials can decrease costs & better inferences PAR T2 Regulatory Context Adaptive Trials Regulatory Context Draft FDA CBER/CDER Guidance published in 2010 Well-understood and Less well-understood Designs EMA published similar reflection paper (2007)

Increase in interest in encouraging adaptive design US: Innovative Cures Act, EU: Adaptive Pathways New FDA Guidance currently at comments stage (next slide) FDA CBER/CDER Adaptive Guidance (2018) New draft guidance published in Oct 2018 (PDUFA VI requirement) Comments up to Nov. 30th Adaptive designs have the potential to improve Far less categorical than 2010 draft ... study power and Emphasizes early collaboration with FDA reduce Focus on design issues and Type I error the sample size and

e.g. pre-specification, blinding, simulationtotal cost" for investigational drugs, In-depth on certain adaptive designsincluding "targeted medicines that are SSR, enrichment, switching, multiple treats being put into Also views on Bayesian and Complex development today Adaptive Trials Evaluation Opportunities 1. Earlier Decisions Risks 1. Complex/Different Stats 2. Reduced Potential Cost 2. Logistical Costs and

Issues 3. Higher Potential 3. Bias/Unblinding (IDMC) Success 4. Greater Generalizability 4. Type I Error Inflation 5. Potential Lower 5. Stakeholder Buy-in Efficiency Sample Size Re-estimation (SSR) Guidance Non-comparative (blinded) SSR is attractive choice With adequate pre-specification, neglible effect on Comparative (unblinded SSR) can provide

efficiency Help trial have power if effect size is less than hypothesized NB: Design and rule pre-specification; error PAR T3 Sample Size Reestimation Sample Size Re-estimation (SSR) Will focus here on specific adaptive design of SSR Adaptive Trial focused on higher sample size if needed

Obvious adaption target due to intrinsic SSD uncertainty Note that more suited to knowable/short follow-up Could also adaptively lower N but not encouraged Two Primary Types: 1) Unblinded SSR; 2) Group Sequential Designs (GSD) GSD facilitates interim analyses Interim analyses occur while trial on-going Interim data analysed at pre-specified times E.g. After 1/2 subjects measured Can stop for benefit or futility

If neither, continue til end/next look Must account for multiple analyses Use spending of and/or errors GSD Changes 1. Futility Only Designs 2. Additional Outputs 3. New Two Sample TTE 4. One Sample Mean GSD 5. One Sample Prop GSD Error Spending (Lan & DeMets) Two Criteria for early stopping

1. 2. Efficacy (-spending) Futility (-spending) Multiple Error Spending Functions OBrien Fleming, Pocock etc. Both and spending work similarly Can be very liberal or conservative At each interim analysis, spending a proportion of the total error Makes analysis at endpoint more

/2 ( )= 2 1 ( ( )) 1 Group Sequential Example m a

e l p Ex A sample size of 242 subjects (121 per treatment group) provides at least 80% power to detect a relative difference of 53% between botulinum toxin A and standardized anticholinergic therapy, assuming a treatment difference of 0.80 and a common SD of 2.1 (effect size = 0.381), and a two-sided type I error rate of 5%. Sample size has been adjusted to allow for a 10% loss to follow-up over the 6-months of

treatment as well as one interim analysis to stop early for benefit. Parameter Value Significance Level (20.05 sided) OnabotulinumtoxinA -2.3 Source: NEJM (2012) Mean Anticholinergic Mean -1.5 Standard Deviation 2.1 (Both) Power

80% # Interim Analyses 1 OBrien Spending Function Fleming Expected Dropout 10% Conditional Power (CP) CP gives prob. of rejecting null given interim test statistic Calculation still depends on what true difference set to Often used as ad-hoc criteria for futility testing in GSD More flexible than -spending but less error guaranteespending but less error guarantee Focus here on CP as measure of promising results Promising meaning less than target but close to target power

Note existence of related Bayesian Predictive Power Essentially conditional power averaged over prior for effect Conditional Power & Unblinded SSR Most common criteria proposed for unblinded SSR is CP SSR suggested when interim results promising (Chen et al) Gives third option vs GSD: continue, stop early, increase N Promising user-defined but based on unblinded effect size Power for optimistic effect but increase N for lower relevant effects? 2 methods here: Chen, DeMets & Lan; Cui, Hung & Wang 1st uses GSD statistics but only penultimate look & high CP 2nd uses weighted statistic but allowed at any look and CP

Initial nQuery Adapt release will be two means & proportions Unblinded SSR Example Assume same design as GSD Example (Example 3) with HSD (=1.5)) futility variant (n = 114) Assume interim difference = 0.6, interim common SD = 2.31and interim n of 57 per group with nominal alpha of 0.0245 for final look. What will required N be for SSR for Chen-Demets-Lan, Cui-Hung-Wang assuming multiplier = 2? Parameter Nominal Final Look Sig.

Level Interim Difference Interim SD (Both) Initial N per Group Interim N per Group Maximum N per group Lower CP Bound Upper CP Bound Value 0.0245 -0.6 2.31 114 57 228 Derived/40%

80% Blinded Sample Size Reestimation BSSR uses interim blinded nuisance parameter estimate Use of blinded data reduces logistical/regulatory issues Considered a well understood type of adaptive design Multiple methods but focus on internal pilot approach Update N based on parameter estimate from internal pilot Blinded SSR nQuery

Summary (Winter 2018) Blinded SSR Means Blinded SSR Props SSR Criteria: Variance SSR Criteria: Overall Success Rate Three 2 Estimate Methods 1. Two Sample Inequality Assumes effect size true

2. Two Sample NI 1. Two Sample Inequality 3. Two Sample Equiv 2. Two Sample NI E m a x e l p

2 Two Sample Mean Blinded SSR Example We estimated that we would need to enrol 160 patients, given an expected mean (SD) annual decline in the FVC of 916 percent of the predicted value and a dropout rate of 15 percent, to achieve a twosided alpha level of 0.05 Parameter

Significance Level (2Sided) Mean Difference (%) Standard Deviation (%) Value 0.05 -9 16 Dropout Rate 15% Target Power

90% Nuisance Parameter? Standard Deviation Source: NEJM (2006) PAR T4 Sample Size Reestimation

Discussion and Conclusions Adaptive Trials expected to become more common Regulatory & legislative environment increasingly positive Major barriers are error control, logistics and resources Pre-specification, FDA collaboration, software solutions SSR continues to be a common form of nQuery Winter 2018 Update

Winter 2018 release adds nQuery Adapt module, 32 new tables & undo/redo Proportions + Crossover 20 New Core Tables Assurance 12 Conditional Power

nQuery Bayes Tables GST + SSR 15 nQuery Adapt Tables Q&A Any Questions? For further details, contact at: [email protected] Thanks for listening!

Resources Summary of whats new in nQuerys Adaptive module: https://www.statsols.com/whats-new ______________________________________________________________ FDA Draft Guidance: https://www.fda.gov/downloads/drugs/guidances/ucm201790. pdf Draft Comments/Submissions (30th November): https://www.federalregister.gov/documents/2018/10/01/201821314/adaptive-designs-for-clinical-trials-of-drugs-and-biologi cs-draft-guidance-for-industry-availability Statsols Blog on FDA Guidance: https://blog.statsols.com/new-fda-guidance-on-adaptive-clinic al-trial-design References

Jennison, C., & Turnbull, B. W. (1999). Group sequential methods with applications to clinical trials. CRC Press. Visco, A. G., et al (2012). Anticholinergic therapy vs. onabotulinumtoxina for urgency urinary incontinence. New England Journal of Medicine, 367(19), 18031813. Chen, Y. J., DeMets, D. L., & Gordon Lan, K. K. (2004). Increasing the sample size when the unblinded interim result is promising. Statistics in medicine, 23(7), 10231038. Cui, L., Hung, H. J., & Wang, S. J. (1999). Modification of sample size in group sequential clinical trials. Biometrics, 55(3), 853-857. Mehta, C.R. and Pocock, S.J., 2011. Adaptive increase in sample size when interim results are promising: a practical guide with examples. Statistics in medicine, 30(28), pp.3267-3284. Friede, T., & Kieser, M. (2006). Sample size recalculation in internal pilot study designs: a review. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 48(4), 537-555.. Tashkin, D. P., Elashoff, R., Clements, P. J., Goldin, J., Roth, M. D., Furst, D. E., ... &