language-icon Old Web
English
Sign In

Very large database

A very large database, (originally written very large data base) or VLDB, is a database that contains a very large amount of data, so much that it can require specialized architectural, management, processing and maintenance methodologies . The vague adjectives of very and large allow for a broad and subjective interpretation, but attempts at defining a metric and threshold have been made. Early metrics were the size of the database in a canonical form via database normalization or the time for a full database operation like a backup. Technology improvements has continually changed what is considered very large. One definition has suggested that a database has become a VLDB when it is 'too large to be maintained within the window of opportunity… the time when the database is quiet'. There is no absolute amount of data that can be cited. For example, one cannot say that any database with more than 1 TB of data is considered a VLDB. This absolute amount of data has varied over time as computer processing, storage and backup methods have become better able to handle larger amounts of data. That said, VLDB issues may start to appear when 1TB is approached, and are more than likely to have appeared as 30TB or so is exceeded. Key areas where a VLDB may present challenges include configuration, storage, performance, maintenance, administration, availability and server resources.:11 Careful configuration of databases that lie in the VLDB realm is necessary to alleviate or reduce issues raised by VLDB databases.:36—53 The complexities of managing a VLDB can increase exponentially for the database administrator as database size increases. When dealing with VLDB operations relating to maintenance and recovery such as database reorganizations and file copies which were quite practical on a non-VLDB become take very significant amounts of time and resource for a VLDB database.. In particular it typically infeasible to meet a typical recovery time objective (RTO), the maximum expected time a database is expected to be unavailable due to interruption, by methods which involve copying files from disk or other storage archives. To overcome these issues techniques such as clustering, cloned/replicated/standby databases, file-snapshots, storage snapshots or a backup manager may help achieve the RTO and availability, although individual methods may have limitations, caveats, license, and infrastructure requirements while some may risk data loss and not meet the recovery point objective (RPO). For many systems only geographically remote solutions may be acceptable.

[ "Information retrieval", "Database", "Data mining", "Artificial intelligence" ]
Parent Topic
Child Topic
    No Parent Topic