Drawing Density Core-Sets from Incomplete Relational Data

2017 
Incompleteness is a ubiquitous issue and brings challenges to answer queries with completeness guaranteed. A density core-set is a subset of an incomplete dataset, whose completeness is approximate to the completeness of the entire dataset. Density core-sets are effective mechanisms to estimate completeness of queries on incomplete datasets. This paper studies the problems of drawing density core-sets on incomplete relational data. To the best of our knowledge, there is no such proposal in the past. (1) We study the problems of drawing density core-sets in different requirements, and prove the problems are all NP-Complete whether functional dependencies are given. (2) An efficient approximate algorithm to draw an approximate density core-set is proposed, where an approximate Knapsack algorithm and weighted sampling techniques are employed to select important candidate tuples. (3) Analysis of the proposed approximate algorithm shows the relative error between completeness of the approximate density core-set and that of a density core-set with same size is within a given relative error bound with high probability. (4) Experiments on both real-world and synthetic datasets demonstrate the effectiveness and efficiency of the algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    0
    Citations
    NaN
    KQI
    []