Abstract 2291: RBP, a community for reproducible bioinformatics

2018 
Background Reproducibility of a research is a key element in the modern science and it is mandatory for any industrial application. It represents the ability of replicating an experiment independently by the location and the operator. Therefore, a study can be considered reproducible only if all used data are available and the exploited computational analysis workflow is clearly described. However, today for reproducing a complex bioinformatics analysis, the raw data and a list of tools used in the workflow could be not enough to guarantee the reproducibility of the results obtained. Indeed, different releases of the same tools and/or of the system libraries (exploited by such tools) might lead to sneaky reproducibility issues. Results To address this challenge, we established the Reproducible Bioinformatics Project (RBP, http://reproducible-bioinformatics.org/), which is an open-source project, whose aim is to provide an infrastructure, based on docker images and R package, to provide reproducible results in Bioinformatics. One or more Docker images are then defined for a workflow (typically one for each task), while the workflow implementation is handled via R-functions embedded in a package available at github repository (https://github.com/kendomaniac/docker4seq). Thus, a bioinformatician participating to the project has firstly to integrate her/his workflow modules into Docker image(s) exploiting an Ubuntu docker image developed ad hoc by RPB to make easier this task. Secondly, the workflow implementation must be realized in R according to an R-skeleton function made available by RPB to guarantee homogeneity and reusability among different RPB functions. Moreover she/he has to provide the R vignette explaining the package functionality together with an example dataset which can be used to improve the user confidence in the workflow utilization. Available workflows: (i) RNAseq/miRNAseq workflows (from fastq to differential expression analysis). (iii) ChIPseq (transcription factors and histones-marks peaks calling). (iv) DNA/RNA SNVs calling (based on GATK-best-practice), (v) Xenome-seq (removing mouse reads contaminating RNA/DNA in patient-derived-xenografts). (vi) HashClone (clonality markers detection tool to quantify minimal residual disease during patient follow-up) Conclusions Reproducible Bioinformatics Project provides a general schema and an infrastructure to distribute robust and reproducible workflows. Thus, it guarantees to final users the ability to repeat consistently any analysis independently by the used UNIX-like architecture. Citation Format: Luca Alessandri, Neha Kulkarni, Riccardo Panero, Martina Olivero, Maddalena Arigoni, Marco Beccuti, Francesca Cordero, Raffaele A. Calogero. RBP, a community for reproducible bioinformatics [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 2291.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []