Managing all those bytes: The Human Genome Project
1993
The three databases of primary importance to the Human Genome Project each store a different kind of information - DNA sequences (GenBank), chromosome mapping information (Genome Data Base), and protein sequence and structure (Protein Information Resource). Currently, these databases are independently administered and are separate physical entities, each with its own system for data collection, storage, and presentation. However, the community would be better served by the convenience of one-stop shopping, provided by a seamless integration of these primary databases into a single virtual database. This integration will necessitate adoption of a standard protocol such as SQL (Structured Query Language) for the interrogation and retrieval of related information simultaneously from several distributed databases. At present, data is accessed from databases across networks with simple document-based protocols such as GOPHER and WAIS (Wide Area Information Server). Although such protocols allow easy access to a wide variety of information (and undoubtedly will be used extensively), they do not provide the connectivity needed for a virtual database.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
2
References
23
Citations
NaN
KQI