Do We Have to Deal with Multiple Comparisons in Neuroimaging? Gang Chen Scientific and Statistical Computing Core National Institute of Mental Health National Institutes of Health, USA afni26_ROI-based-modeling.pdf Preview Multiplicity problems in neuroimaging Improving modeling from two perspectives o o Weirdness of p-value Information waste and inefficient modeling Application #1: region-based analysis (RBA) Whole-brain voxel-wise group analysis Program available in AFNI: BayesianGroupAna.py o Application #2: matrix-based analysis (MBA) FMRI: inter-region correlation (IRC) o DTI: white-matter properties (FA, MD, RD, AD, etc.) o Naturalistic scanning: Inter-subject correlation (ISC)

Program available in AFNI: MBA o Conventional group analysis: voxel-wise Simple situations o o o Students t-test: one-, two-sample, paired t-test General linear model (GLM) with between-subjects variables (sex, age, ) 3dttest++ and 3dMVM in AFNI Situations with within-subject factors o o Univariate GLM for AN(C)OVA: not always performed correctly Multivariate GLM: 3dMVM in AFNI Other complicated situations o o Missing data, within-subject quantitative covariates (reaction time, ) Linear mixed-effects modeling: 3dLME in AFNI

Headache: multiplicity! Conventional matrix-based analysis Matrices from individual subjects o o o Inter-region correlations (IRCs), inter-subject correction (ISC) White-matter properties: missing data Others: coherence, mutual information, entropy, Group analysis o o o Mirroring adoption of whole-brain analysis Univariate GLM: treating matrix elements as isolated entities NBS, CONN, FSLNets in FSL, GIFT, Brain Connectivity Toolbox, Graph theory o o Arbitrary thresholding, artificial dichotomization Garden of forking paths: scores of metrics (hub, community, clique, smallworld, )

Headache: multiplicity! 4 multiplicity problems Element-wise modeling (multi-model problem) o o o o aka massively univariate modeling Perform whole-brain voxel-wise or element-wise in matrix analysis Pretend all spatial elements are isolated and unrelated to each other Recoup the false assumption through correction: heavy penalty and inefficient Sidedness testing o Simultaneously infer both positive and negative effects: dominantly adopted Multiple comparisons o o (conventional concept) Simultaneously compare groups, conditions, and interactions Not much attention paid so far

Multiverse problem: researcher degrees of freedom o o o Thousands of options coexist: different preprocessing pipelines, modeling strategies, software Garden of forking paths: only reporting significant discoveries No easy solutions exist Conventional statistical testing Nullstrategy hypothesis significance testing (NHST) o We are all indoctrinated under the paradigm Build a strawman H0: nothing happens in brain o Attack strawman H0 with weirdness of data under H0 : p-value o o

Type I error = P(reject H0 | H0) = false positive = p-value Type II error = P(accept H0 | H1) = false negative Dichotomize data based on magic number 0.05 Nice properties of NHST o Consistent with Karl Poppers philosophy o o Falsification or refutation Inductive: all swans are white Intuitive: innocent until proven guilty Economical/utility: categorization Reject H0

(guilty) Fail to Reject H0 ADHD, autism, emission test (pass vs fail), (not guilty) Courtroom Hidden Truth Innocent Type I Error (defendant very unhappy) Correct Guilty Correct Type II Error (defendant very happy) Weirdness of p Strawman H : artificial valueconstruct 0 o

Witch hunt: usually of no interest o Artificially binarize continuous world: innocent vs guilty o o Effect of absolute zeros? Who believes no effect everywhere in brain? activated vs not activated? Or strength of evidence for activation? P-value flows in our blood: unaware of weirdness and troubles Disconnection/misinterpretation: P (weirdness| H0)P (H0 | data) P-value: P (weirdness | H0) Research interest: P (effect > 0 or < 0| data) Problems with dichotomous decision o o o o

P-value of 0.05 vs 0.055 or cluster of 54 vs 53 voxels? Statistically insignificant = non-existing effect? Absence of evidence = evidence of absence? Difference between significant & insignificant results: not necessarily significant Selection bias about effect estimates in results reporting Power analysis based on literature: not very useful other than pleasing grant reviewers One source of reproducibility problems: big/tall parents (violent men, engineers) have more sons; beautiful parents (nurses) have more daughters; power posing Unreliable meta analyses: many potential effects unreported Clusters vs islands: arbitrariness Threshold (sea level) 1 Threshold (sea level) 2 Problems with clusters islands above sea level Cluster thresholding: approach o Use cluster size as leverage in controlling overall false positives (FWE) o

Hide everything below threshold o Unfair: 2 regions with same signal strength: one large and one small 2 regions with same signal strength: one distant and one contiguous Clusters are statistically defined o Arbitrary: regardless of rigor in FWE controllability Penalize and discriminate small regions o Monte Carlo simulations, RFT, combination of cluster size and signal strength Do not respect anatomical structures

Lack spatial specificity: bleeding effect or forming huge clusters Focus on statistically defined peak voxels Sidedness for whole brain: one- or two-sided? Problems with element-wise Firstmodeling step: apply same model to all elements o o Pretend all elements are isolated and unrelated: false assumption Source of multi-model problem: number of models = number of elements Second step: correct for multi-model and false assumption o Use cluster size as leverage in controlling overall false positives Monte Carlo simulations, RFT, combination of cluster size and signal strength Problems Loss of efficiency due to split-modeling and false assumption o Over-penalization o Reinforcing arbitrary thresholding and dichotomization

How can we do better? Prior knowledge: elements are not unrelated o Conceptually P(weirdness| H0)P(H0|data), but practically P(weirdness| H0 )P(H0|data)? How to incorporate prior Priorsknowledge? are omnipresent in life o Walking stairs, prejudices, stereotypes, etc. But priors are not always easy to digest! o o Infamy: subjective??? Are we eating acrylamide for breakfast? Both sides good One side BURNT Both sides BURNT How to incorporate prior knowledge? Kidney cancer distribution among U. S. counties

Highest rate Calibration lowest rate How to incorporate prior knowledge? More examples o o o LeBron James field goals percentage: 50.4% Monthly divorce rate, suicide rate KISS principle Steins paradox (1956) Calibration Free market vs regulations Morals from kidney cancer Multiplicity problem: > 3000 counties! data o

o Divide p-value by number of counties? Borrow idea from neuroimaging: leverage geographical relatedness? What can we learn from the example? Food for thought o Care about the strawman H0 (zero kidney rate), false positives, p-value? o Trust individual county-wise estimates? Unbiased! BLUE o Incorrect sign errors (type S): some counties really have higher kidney cancer rate than others? Incorrect magnitude (type M): some counties really have higher/lower cancer rate? Would correction for multiplicity help at all? Useless in controlling for type S and M errors

How can we do better? o o Information across spatial elements can be shared and regularized How??? What do we know about spatial Element-wise modeling elements? o o o Pretend full ignorance: fully trust the data Uniform distribution: each element equally likely to have any value in (- , +) Similar for variances: variances can be negative in ANOVA One crucial prior for spatial elements o o Reasonable to assume Gaussian distribution? Gaussian assumption adopted everywhere!

o Subjects, residuals across TRs How can Gaussian assumption help? Loosely constraining elements No full trust for individual estimates Information sharing: shrinkage or partial pooling Controlling type S and M errors Short summary: what we intend to achieve Abandon strawman and p-value o Directly focus on research interest: P (effect > 0 | data) Build one model o o o

o o o Incorporate all elements into a multilevel or hierarchical structure Loosely constrain elements: leverage prior knowledge Achieve higher modeling efficiency: no more multiplicity! Validate the model by comparing with potential competitors Be conservative on effect estimates by controlling type S and M errors: biased? Always be mindful of uncertainties: strength of evidence (no proof) Avoid dichotomous decisions o o Report full results if possible Highlight instead of hide based on gradient of evidence Application #1: region-based analysis Dataset o o Subjects: n = 124 children; resting-state data (Xiao et al., 2019) Individual subjects: seed-based correlation for each subject

o 3D correlation between seed and whole brain (functional connectivity) Explanatory variable (behavior data): Theory of Mind Index Uniform distribution: Voxel-wise group analysis: GLMs o o o o total freedom - each parameter on its own Focus: association between and seed-based correlation (z-score) Pretense: voxels unrelated - equal likelihood within (Information waste! GLMs: mass univariate - multiplicity m = 100,000 voxels 100,000 models Xiao et al., 2019. Neuroimage 184:707-716 GLMs: dealing with multiplicity!

Voxel-based analysis: GLMs o o Penalty time for pretense: multiple testing (m = 100,000), magic 0.05 Show time for various correction methods Voxel-wise p, FWE, FDR, spatial smoothness, clusters, Simulations, random field theory, permutations, How would dataset turn out under GLM? 4 lucky clusters manage to survive Switching from voxels to ROIs: still GLMs Region-wise analysis : GLMs Focus: association between and seed-based correlation (zscore) o Pretense: ROIs unrelated Uniform distribution: o GLMs: mass univariate total freedom - each parameter on its own.

m = 21 ROIs 21 models o Penalty time for pretense: multiple testing what to do? o Bonferroni? Unbearable What else? Switching from GLMs to LME Region-wise analysis : Linear Mixed-Effects (LME) model o o One model integrates all regions ROIs loosely constrained instead of being unrelated New Gaussian distribution: Is it far-fetched or subjective? components Similar to cross-subject variability idiosyncrati o Goal: effect of interest b + j

Overall effect: c effect of ith subject shared by all o Differentiation: fixed vs. random Unique ROIs and Fixed: epistemic uncertainty effect of subjects Random: aleatoric uncertainty jth ROI o What can we get out of LME? o Conventional framework Estimates for fixed effects Variances for random effects Dead end!

Switching from GLMs to BML Region-wise analysis : Bayesian multilevel (BML) model o o One model integrates all regions: basically same as LME ROIs loosely constrained instead of being unrelated o o Goal: effect of interest b + j No more differentiation: fixed vs. random o All parameters: aleatoric Overall effect: shared by all Markov chain Monte Carlo (MCMC) ROIs and Inferences via posterior distribution subjects

Same model as LME plus priors o Gaussian distribution: Is it far-fetched or subjective? Similar to cross-subject variability Ka-ching! Chen, et al, 2019. Handling Multiplicity in Neuroimaging through Bayesian Lenses with Multilevel Modeling. Neuroinformatics. New components Idiosyncrat ic effect by ith subject Unique effect by jth ROI From GLMs to LME to BML Chen, et al, 2019. Handling Multiplicity in Neuroimaging through Bayesian Lenses with Multilevel Modeling. Neuroinformatics.

Inferences from BML: full distributions Region-based BML: 21 ROIs Full report with richer information: posterior distributions for each ROI No dichotomization Highlight, not No results hiding hide No discrimination against small regions No ambiguities about spatial specificity No inconvenient interpretation of confidence interval Evidence for each ROI: P (effect > 0 | data) 8 ROIs with strong evidence of effect compared to Region-wise GLM with Bonferroni correction Voxel-wise GLM at cluster level: 4 clusters

How about Left SFG? Inferences from BML: uncertainty ROI-based BML: 21 ROIs Full report with bar graph uncertainty intervals o Nothing hidden under sea level How about Left SFG? 8 ROIs with strong evidence for effect of interest Highlig ht, not hide Shrinka ge / partial pooling BML: model validations ROI-based BML with 21 ROIs: cross-validation o Leave-one-out information criterion (LOOIC) Crossvalidation

Posterior predictive checking Effects of BML o Regularizing ROIs: dont fully trust individual ROI data o Sacrificing fit at each ROI; achieving better overall fit o GLM Data Realizations from fitted model BML BML: Whole-brain vs. region-base analysis Region-based analysis + high region specificity: region definitions considered as priors + low computational cost + avoiding potential alignment issues by defining regions in native space - not all regions have been defined - information loss due to averaging within each region - region definitions can be tricky

relying on results accuracy in literature (e.g., publication bias) different atlases/parcellations Whole-brain analysis + independent of region definitions + less likely to miss small regions that are not in available atlases/parcellations - vulnerable to poor alignment across subjects - region specificity problem Voxel-wise results do not respect region definitions - Computationally challenging hopeful: within-chain parallelization and GPU usage Application #2: matrix-based analysis Dataset: correlation matrix o o o

Subjects: n = 41 subjects; response-conflict task (Choi et al., 2012) Individual subjects: correlation matrix among m = 16 ROIs How to go about group analysis? o GLM for each element in correlation matrix: NBS, CONN, FSLNets in FSL, GIFT Binarization approach: graph theory More broadly: matrix-based analysis (MBA) (network modeling) Inter-region correlation (IRC): FMRI White matter properties (FA, MD, ): DTI Other matrices (e.g., coherence, entropy, mutual information) Focus on GLM o Student t-test or GLM on each element

o o o o M = 120 massively univariate models Pretense again: all elements are unrelated Equal likelihood within (Information waste Penalty time again: permutations? FDR? Choi et al., 2012. Neuroimage 59(2):1912-1923 Dealing with inter-region correlations (IRCs) Complexities of IRCs o Some region pairs are unrelated, but others are correlated o Correlation structure is intricate o Can we do a better job than GLMs or dichotomization? Challenge: How to characterize the complex structure? IRC: switching from GLM to LME

IRC analysis through linear mixed-effects (LME) modeling o o One model integrates all ROIs: LME ROIs loosely constrained instead of being unrelated Gaussian distribution: Is it far-fetched? Unique effect Similar to cross-subject variability at ith & jth ROI Unique o Differentiation: fixed vs. random effect of Fixed: epistemic uncertainty RP overall effect: Random: aleatoric uncertainty shared by all o Effects of interest ROIs and region pair: b + + + 0 i j jj subjects region: 0.5*b + 0 i

o LME wouldnt work! Dead end! Unique effect at ith & jth ROI forunique kth subject effect by kth subject IRC: one more jump from LME to BML IRC analysis through Bayesian multilevel (BML) modeling o o One model integrates all ROIs: BML (essentially same as LME) ROIs loosely constrained instead of being unrelated o

Gaussian distribution: Is it far-fetched? Similar to cross-subject variability Unique effect at ith & jth ROI unique No differentiation: fixed vs. random All parameters: aleatoric uncertainty overall effect: o Effects of interest shared by all region pair: b + + + 0 i j jj ROIs and region: 0.5*b + 0 i subjects o LME plus priors o effect of

RP Unique effect at ith & jth ROI forunique kth subject effect by kth subject MCMC Posterior distribution Ka-ching! Chen, et al, 2019. An integrative Bayesian approach to matix-based analysis in neuroimaging. bioRxiv. From GLMs to LME to BML Chen, et al, 2019. An integrative Bayesian approach to matix-based analysis in neuroimaging. bioRxiv. IRC ROI effect from BML: full distributions ROI-based BML: 16 ROIs Full report with richer information: posterior distributions for each ROI No dichotomization Nothing hidden under sea level

4 ROIs with strong evidence of effect compared to Region effect inferences: unavailable from GLM and graph theory Hubness? How about Left & Right Anterior Insula? Highlig ht, not hide IRC RP effect from BML: full distributions 120 RPs Highlig ht, not hide IRC- RP effect from BML ROI-based BML: 16 ROIs Full report for all region pairs (RPs) Comparisons with GLMs: nothing hidden under sea level 63 RPs identified by GLMs with p of 0.05: none survived after correction with NBS via permutations BML

GLM 33 RPs with strong evidence under BML Highlig ht, not hide BML: model validations ROI-based BML with IRD of 16 ROIs: cross-validation o Leave-one-out information criterion (LOOIC) Cross-validation GL M Posterior predictive checking Effects of BML o o Regularizing ROIs: dont fully trust individual ROI data o Sacrificing fit at each ROI; achieving better overall fit

BML Summary Multiplicity problems in neuroimaging Improved modeling from two perspectives o o Weirdness of p-value Information waste and inefficient modeling Application #1: region-based analysis (RBA) Task-related experiment or resting state (seed-based correlation analysis) Program available in AFNI: BayesianGroupAna.py o Application #2: matrix-based analysis (MBA) FMRI: inter-region correlation (IRC) o DTI: white matter properties (FA, MD, etc.) o Naturalistic scanning: Inter-subject correlation (ISC) Program available in AFNI: MBA o Keep Kidney Cancer in Mind!distribution among counties Kidney cancer Highest rate

Calibration lowest rate Acknowledgements Paul-Christian Burkner (Department of Psychology, University of Munster) Andrew Gelman (Columbia University), Stan Development Team, R Foundation Yaqiong Xiao, Elizabeth Redcay, Tracy Riggins, Fengji Geng Luiz Pessoa, Joshua Kinnison (Depart of Psychology, University of Maryland) Zhihao Li (School of Psychology and Sociology, Shenzhen University, China) Lijun Yin (Department of Psychology, Sun Yat-sen University, China) Emily Finn, Daniel Handwerker (SFIM/NIMH, National Institutes of Health)