External Data Access And Indexing In AsterixDB

2015 
Traditional database systems offer rich query interfaces (SQL) and efficient query execution for data that they store. Recent years have seen the rise of Big Data analytics platforms offering query-based access to "raw" external data, e.g., file-resident data (often in HDFS). In this paper, we describe techniques to achieve the qualities offered by DBMSs when accessing external data. This work has been built into Apache AsterixDB, an open source Big Data Management System. We describe how we build distributed indexes over external data, partition external indexes, provide query consistency across access paths, and manage external indexes amidst concurrent activities. We compare the performance of this new AsterixDB capability to an external-only solution (Hive) and to its internally managed data and indexes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    14
    Citations
    NaN
    KQI
    []