Managing all those bytes: The Human Genome Project

1993 
The three databases of primary importance to the Human Genome Project each store a different kind of information - DNA sequences (GenBank), chromosome mapping information (Genome Data Base), and protein sequence and structure (Protein Information Resource). Currently, these databases are independently administered and are separate physical entities, each with its own system for data collection, storage, and presentation. However, the community would be better served by the convenience of one-stop shopping, provided by a seamless integration of these primary databases into a single virtual database. This integration will necessitate adoption of a standard protocol such as SQL (Structured Query Language) for the interrogation and retrieval of related information simultaneously from several distributed databases. At present, data is accessed from databases across networks with simple document-based protocols such as GOPHER and WAIS (Wide Area Information Server). Although such protocols allow easy access to a wide variety of information (and undoubtedly will be used extensively), they do not provide the connectivity needed for a virtual database.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    2
    References
    23
    Citations
    NaN
    KQI
    []