Towards Long-term and Archivable Reproducibility

2020 
Reproducible workflow solutions commonly use high-level technologies that were popular when they were created, providing an immediate solution which is unlikely to be sustainable in the long term. We therefore introduce a set of criteria to address this problem and demonstrate their practicality and implementation. The criteria have been tested in several research publications and can be summarized as: completeness (no dependency beyond a POSIX-compatible operating system, no administrator privileges, no network connection and storage primarily in plain text); modular design; minimal complexity; scalability; verifiable inputs and outputs; temporal provenance; linking analysis with narrative; and free-and-open-source software. As a proof of concept, we have implemented "Maneage", a solution which stores the project in machine-actionable and human-readable plain-text, enables version-control, cheap archiving, automatic parsing to extract data provenance, and peer-reviewable verification. We show that requiring longevity of a reproducible workflow solution is realistic, without sacrificing immediate or short-term reproducibility and discuss the benefits of the criteria for scientific progress. This paper has itself been written in Maneage, with snapshot 1637cce.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    49
    References
    7
    Citations
    NaN
    KQI
    []