Algorithms for the discovery of embedded functional dependencies

2021 
Embedded functional dependencies (eFDs) advance data management applications by data completeness and integrity requirements. We show that the discovery problem of eFDs is $${\mathsf {NP}}$$ -complete, $$\mathsf {W}[2]$$ -complete in the output, and has a minimum solution space that is larger than the maximum solution space for functional dependencies. Nevertheless, we use novel data structures and search strategies to develop row-efficient, column-efficient, and hybrid algorithms for eFD discovery. Our experiments demonstrate that the algorithms scale well in terms of their design targets, and that ranking the eFDs by the number of redundant data values they cause can provide useful guidance in identifying meaningful eFDs for applications. We further demonstrate the benefits of introducing completeness requirements and ranking by the number of redundant data values for other variants of functional dependencies. Finally, we show how to compute informative Armstrong samples and illustrate the performance of our algorithms on the benchmark data. The informative Armstrong samples can be used to find eFDs that are meaningful for the application domain but violated by a given data set due to inconsistencies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    0
    Citations
    NaN
    KQI
    []