Exploring the Hazards of Scaling Up Clinical Data Analyses: A Drug Side Effect Discovery Case Report

2021 
We assessed the scalability of pharmacological signal detection use case from a single-site CDW to a large aggregated clinical data warehouse (single-site database with 754,214 distinct patient IDs vs. multisite database with 49.8M). We aimed to explore whether a larger clinical dataset would provide clearer signals for secondary analyses such as detecting the known relationship between prednisone and weight. We found significant weight gain rate using the single-site data but not from using aggregated data (0.0104 kg/day, p<0.0001 vs. -0.050 kg/day, p<.0001). This rate was also found more consistently across 30 age and gender subgroups using the single-site data than in the aggregated data (26 vs. 18 significant weight gain findings). Contrary to our expectations, analyses of much larger aggregated clinical datasets did not yield stronger signals. Researchers must check the underlying model assumptions and account for greater heterogeneity when analyzing aggregated multisite data to ensure reliable findings.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []