Interactive Cleaning for Progressive Visualization through Composite Questions
In this paper, we study the problem of interactive cleaning for progressive visualization (ICPV): Given a bad visualization V , it is to obtain a "cleaned" visualization V whose distance is far from V , under a given (small) budget w.r.t. human cost. In ICPV, a system interacts with a user iteratively. During each iteration, it asks the user a data cleaning question such as "how to clean detected errors x?", and takes value updates from the user to clean V . Conventional wisdom typically picks a single question (e.g., "Are SIGMOD conference and SIGMOD the same?") with the maximum expected benefit in each iteration. We propose to use a composite question – i.e., a group of single questions to be treated as one question – in each iteration (for example, Are SIGMOD conference in t 1 and SIGMOD in t 2 the same value, and are t 1 and t 2 duplicates?). A composite question is presented to the user as a small connected graph through a novel GUI that the user can directly operate on. We propose algorithms to select the best composite question in each iteration. Experiments on real-world datasets verify that composite questions are more effective than asking single questions in isolation w.r.t. the human cost.