Curious Containers: A framework for computational reproducibility in life sciences with support for Deep Learning applications

2020 
Abstract In clinical scenarios, there is an increasing interest in complex computational experiments, as for example the training of Deep Learning models. Reproducibility is an essential property of such experiments, especially if the result contributes to a patient’s treatment. This paper introduces Curious Containers, a software framework for computational reproducibility that treats data, software and runtime environment as decentralized network resources. All experiment resources are described in a single file, using a new format that is compatible with a subset of the Common Workflow Language. Docker is used to deploy the experiment software in a container image, including arbitrary data transmission programs to connect with existing storage solutions. The framework supports Deep Learning applications, that have a high demand in storage and processing capabilities. Large datasets can be mounted inside containers via network filesystems like SSHFS based on the filesystem in user-space technology. The Nvidia-Container-Toolkit enables GPU usage. Curious Containers has been tested in two biomedical scenarios. The first use case is a Deep Learning application for tumor classification in images that requires a large dataset and a GPU. In this context, a prototypical integration of the framework with the existing Data Version Control system for exploratory Deep Learning modeling has been developed. The second use case extends an existing container image, including a scientific workflow for detection and comparison of human protein in mass spectrography data. The container image was originally developed for an archiving platform and could be extended to be compatible with both Curious Containers and cwltool, the Common Workflow Language reference implementation. The presented solution allows for consistent description and execution of computational experiments, while trying to be both flexible and interoperable with existing software and standards. Support for Deep Learning experiments is gaining importance as such systems are increasingly validated as medical decision support systems.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    3
    Citations
    NaN
    KQI
    []