The PRoteomics IDEntification (PRIDE) Converter 2 Framework: An Improved Suite of Tools to Facilitate Data Submission to the PRIDE Database and the ProteomeXchange Consortium

2012 
The sharing of biological data in the public domain is generally considered to be good scientific practice. This concept of data sharing has gained substantial traction in the field of MS-based proteomics, in which the PRIDE1 (PRoteomics IDEntifications) database (http://www.ebi.ac.uk/pride) at the European Bioinformatics Institute (EBI, Cambridge, UK) is one of the most prominent public data repositories (1). PRIDE stores MS and MS/MS spectra, the derived peptide and protein identifications and expression values if available (the processed experimental results), and any associated metadata. It is important to highlight that data stored in PRIDE is not reprocessed after submission. PRIDE, in its current form, represents the submitter's view of the data. PRIDE is also a founding member of the ProteomeXchange (PX) consortium (http://www.proteomexchange.org) (2). The PX members, led by PRIDE and PeptideAtlas (3), are currently working toward the implementation of a system that enables the automated and standardized sharing of MS-based proteomics data between the main proteomics repositories. In this framework, PRIDE is the initial submission point for tandem MS data. Currently, the first pilot PX submissions (containing raw data and processed results) have already been carried out (http://proteomecentral.proteomexchange.org) and the system is now starting to accept regular submissions. At present, submissions to PRIDE are performed using a publicly available XML data format called PRIDE XML, which is built around the mzData data standard format (4). Several scientific journals (e.g. Molecular and Cellular Proteomics, Proteomics, and Nature Publishing Group journals) are supporting a gradual move toward mandating public deposition of MS data to support the publication of related manuscripts. In parallel, several funding agencies (such as The Wellcome Trust, NIH, and BBSRC) are also enforcing the public availability of experimental data in the context of their funded projects. Despite these efforts, the field of MS proteomics is still lagging behind other more mature “omics” disciplines in terms of public data availability (5). In practical terms, a major contribution to this public data-sharing policy trend is provided by the availability of reliable and user-friendly submission tools. Such tools must be able to capture properly the experimental data and any supporting technical and biological metadata. In addition, to encourage MS data deposition the submission process has to be as easy as possible. This was the philosophy that drove the development of the original PRIDE Converter (6) (http://pride-converter.googlecode.com), an open source and platform-independent software tool for the submission of proteomics data to PRIDE. PRIDE Converter can convert input data from a large variety of popular MS proteomics formats into PRIDE XML, guiding the user through the process by a graphical user interface (GUI). As a result, PRIDE Converter made the submission of MS data a much easier and more straightforward process, especially for researchers without bioinformatics support. PRIDE Converter has definitely been a key factor in the huge growth in data content in PRIDE since 2008 (7) and has become the de facto submission tool to PRIDE for most researchers. PRIDE Converter has been regularly updated and more than 30 different releases have been made publicly available. However, after receiving extensive feedback from users, it became apparent that the original PRIDE Converter had some limitations mainly in terms of software architecture, memory requirements, difficulties to extend the supported formats, and a lack of functionality for performing batch conversions (a frequent request). In addition, new use cases needed to be supported, such as support for quantitative information and the ability to easily post-process the large XML files generated during the conversion process. To overcome these limitations, we decided to design a new submission tool from the ground up, which would be suitable to the evolving needs of our submitters. In this manuscript we describe the PRIDE Converter 2 framework, including all of its new features and supported use cases. We are certain that future submitters to PRIDE and to the PX consortium will benefit immensely from the availability of this new submission tool.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    99
    Citations
    NaN
    KQI
    []