IdMapper: A Java Application for ID Mapping across Multiple Cross-referencing Providers

2009 
Abstract We developed an identifier mapping application for bio-informatics research in Java programming language. It is easy to use and provides many usability functionalities that are expected as essentials for a professional appli-cation. It supports three widely used mapping services and can convert many ids from one source database in-to many target databases at once. Id mapping across service providers is possible by remapping the resultant ids. Because it adheres to the NetBeans platform archi-tecture, it can be incorporated into other NetBeans plat-form applications as an id mapping provider without adaption or modification.Availability: You can access the application at http://ne-on. gachon.ac.kr/idmapper.html.Keywords: ID mapping, java, netbeans, rich client, web services Introduction “-omics” style experiments, such as sequencing, micro-array, and proteomics, generate enormous amounts of data (Kim et al., 2006, Kim et al., 2008). To integrate and analyze the data in various respects, multiple tools are used. Each tool supports its own set of identifiers for genes or gene products. Even though there are widely used identifiers, such as Ensembl (Flicek et al., 2008), NCBI RefSeq (Pruitt et al., 2007), UniProt (The Uniprot Consortium, 2008) and HGNC (Povey et al., 2001), there is no universal identifier. When multiple tools and services are used, identifier conversion prob-lems across multiple databases arise frequently. A small research group can not afford its own mapping service and relies on external services. There exist efforts to reconcile identifiers across multiple source databases, such as the David gene ID conversion tool (Huang et al., 2008), MatchMiner (Bussey et al., 2003), IDconverter (Alibes et al., 2007), Onto- translate (Khatri et al., 2006), PICR (Cote et al., 2007), Synergizer (Brriz et al., 2008), and Biomart (Smedley et al., 2009). Services that are provided in the form of a web page can not be incorporated into other applications easily. Only a few services provide API access for batch map-ping or incorporation into other applications. The Protein Identifier Cross-Reference (PICR) service is a web application that provides interactive and pro-grammatic (SOAP and REST) access to a mapping algo-rithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references. The Syner-gizer is a service for translating between sets of bio-logical identifiers. It can, for example, translate Ensembl Gene IDs to Entrez Gene IDs, or IPI IDs to HGNC gene symbols, and much more. The Synergizer works via a web interface (for users who are not programmers) or through a web service (for programmatic access). BioMart is a query-oriented data management system, developed jointly by the Ontario Institute for Cancer Research (OICR) and the European Bioinformatics Insti-tute (EBI). It can be accessed via web services and can be adapted for identifier mapping. BridgeDb, though not released publically yet, is an-other attempt at an id mapping framework for bio-informatics applications. BridgeDb lets one add the fol-lowing capabilities quickly and easily: translate identi-fiers from one system to another, search references by id or symbol, and link out to online information for an identifier. Applications, such as the PathVisio pathway analysis tool, WikiPathways, CyThesaurus Cytoscape plug-in, and the NetworkMerge Cytoscape plug-in, are utilizing functionalities provided by BridgeDB.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    1
    Citations
    NaN
    KQI
    []