Clustering Geospatial Data for Multiple Reference Points

2019 
Data clustering plays a significant role in geospatial data management and analytics. In this light, we propose and study a novel geospatial data clustering method for multiple reference points. Given a set Q of geospatial data points, a candidate set O of reference points, and a threshold k, each data point q will be matched to its closest reference point o. The multi-reference clustering (MRC) method finds a subset A (A ⊆ O ∧ |A| ≤ k) reference points from O which define the minimum global travel distance (Σ P∀q∈Q,o∈A d(q, o)), hence the data are grouped into |A| clusters. We believe that the MRC method may benefit a lot of applications including geospatial data clustering, data classification, and data analytics in general. The MRC problem is challenging due to its high computation complexity, and there exist Σ i=1 k C |O| i = Σ i=1 k (|O|!/i!(|O|-i)! possibilities for subset A. Because the exact solution cannot be computed in real time, we develop a heuristic method to select subset A from O efficiently. The experimental results show that the accuracy of A is very close to the optimal solution. In addition, we also develop a set of optimization techniques to further enhance the efficiency. Finally, we conduct extensive experiments to study the efficiency and accuracy of the heuristic method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    2
    Citations
    NaN
    KQI
    []