ProvCaRe: A Large-Scale Semantic Provenance Resource for Scientific Reproducibility

2021 
Reproducibility is a critical component for scientific advancement, because it is essential to ensure that new research is based on sound experimental results, considering that increasingly limited funding resources are allocated to rigorously designed research studies. Scientific reproducibility is particularly important in the biomedical sciences that involve patient safety and the broader domain of human health. We present our semantic provenance resource called Provenance for Clinical and Health Research (ProvCaRe) that has extracted provenance information, which is a core component of scientific reproducibility, from all 1.6 million biomedical full-text articles available in the PubMed database. To the best of our knowledge, the ProvCaRe is the largest repository of real-world provenance metadata with terms mapped to a unique provenance ontology that extends the W3C PROV Ontology (PROV-O) for biomedical sciences. To extract the provenance metadata from unstructured text in published articles, we developed a novel ontology-driven natural language processing (NLP) pipeline that identifies and extracts structured provenance information. The ProvCaRe tool features an intuitive search and query interface with a new provenance-based ranking dashboard that enables users to assign custom weights to query results. The ProvCaRe semantic provenance resource is enabling us to systematically evaluate the reproducibility of biomedical research studies in a comprehensive manner that was not possible earlier for the biomedical research community.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []