HST+: An Efficient Index for Embedding Arbitrary Metric Spaces
Metric embeddings have been widely used in approximate algorithms to guarantee the effectiveness of geometric problems. Among the metric embedding techniques, Hierarchically Separated Tree (HST) is one of the most prevalent data structures, which maps the points of the original metric space into a tree-based metric space. A few selected applications of the HST include clustering, task assignment, trip planning, privacy preservation, information routing in wireless sensor networks, etc. Despite the popularity in ensuring the effectiveness, the HST-based solutions can be inefficient in large-scale datasets, since the state-of-the-art construction method has high time and space complexity (O(n3) and O(n2) in the worst-case). Moreover, existing studies overlook the insertion of new points in real applications (deletions can be trivially supported), which can cause the reconstruction of the whole HST. To address these limitations, we focus on designing an efficient index for embedding arbitrary metric spaces by tree metric spaces. Specifically, for construction, we design a dynamic programming-based method, which significantly reduces the time and space complexity to O(n2) and O(n) respectively. For insertion of new points, we propose a new data structure, called Hierarchically Separated Forest (HSF), i.e., a collection of HSTs. An HSF can efficiently support insertion of new points with a tight theoretical guarantee (O(log n)). Finally, extensive experiments demonstrate the superior performance of our proposed algorithms with respect to the effectiveness and the running time. For instance, compared with the state-of-the-art algorithms, our construction method is up to 29.8× faster and our insertion method is up to 491× faster.