TCGA Toolbox: an Open Web App Framework for Distributing Big Data Analysis Pipelines for Cancer Genomics

2013 
The diversity and volume of data generated by the cancer genome atlas (TCGA) has been increasing exponentially, with the number of data files hosted by NHI, currently 3/4 million, doubling every 7 months since January 2010. The proponents have recently developed a browser-based self-updating mechanism to catalog this dynamic big data repository. In this report, that foundation is built upon to devise a web app framework to distribute TCGA analytical pipelines in a manner that can be fully reproducible without the usual requirement for a pre-installed specialized computational statistics environment. The solution found relies exclusively of sandboxed code injection (JavaScript) and on access permission configuration by the browser's app store. This framework was devised with an open architecture such that third party analyses, ideally hosted with web-facing version control in a repository such as GitHub, SourceForge, Bitbucket, or Google Code, can be distributed to the toolbox. The openness of the framework developed is specifically reflected by enabling the user to invoke the third party analysis simply by inputing the corresponding URL. Similarly, the toolbox also mediates the ability of the user to then distribute the result of the analysis as a reproducible procedure, also fully invoked as a Universal Resource Locator (URL).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    2
    Citations
    NaN
    KQI
    []