Trial and error: a hierarchical modeling approach to test-retest assessment

2021 
The concept of test-retest reliability indexes the consistency of a measurement across time. High reliability is critical for any scientific study, but specifically for the study of individual differences. Evidence of poor reliability of commonly used behavioral and functional neuroimaging tasks is mounting. Reports on low reliability of task-based fMRI have called into question the adequacy of using even the most common, well-characterized cognitive tasks with robust population-level effects, to measure individual differences. Here, we lay out a hierarchical framework that estimates reliability as a correlation divorced from trial-level variability, and show that reliability estimates tend to be higher compared to the conventional framework that adopts condition-level modeling and ignores across-trial variability. We examine how estimates from the two frameworks diverge and assess how different factors (e.g., trial and subject sample sizes, relative magnitude of cross-trial variability) impact reliability estimates. We also show that, under specific circumstances, the two statistical frameworks converge. Results from the two approaches are approximately equivalent if (a) the trial sample size is sufficiently large, or (b) cross-trial variability is in the same order of magnitude as, or less than, cross-subject variability. As empirical data indicate that cross-trial variability is large in most tasks, this work highlights that a large number of trials (e.g., greater than 100) may be required to achieve precise reliability estimates. We reference the tools TRR and 3dLMEr for the community to apply trial-level models to behavior and neuroimaging data and discuss how to make these new measurements most useful for current studies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    38
    References
    3
    Citations
    NaN
    KQI
    []