Skip to main content

kd-SNN: A Metric Data Structure Seconding the Clustering of Spatial Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8579))

Abstract

Large amounts of spatio-temporal data are continuously collected through the use of location devices or sensor technologies. One of the techniques usually used to obtain a first insight on data is clustering. The Shared Nearest Neighbour (SNN) is a clustering algorithm that finds clusters with different densities, shapes and sizes, and also identifies noise in data, making it a good candidate to deal with spatial data. However, its time complexity is, in the worst case, O(n 2), compromising its scalability. This paper presents the use of a metric data structure, the kd-Tree, to index spatial data and support the SNN in querying for the k-nearest neighbours, improving the time complexity in the average case of the algorithm, when dealing with low dimensional data, to at most O(n ×logn). The proposed algorithm, the kd-SNN, was evaluated in terms of performance, showing huge improvements over existing approaches, allowing the identification of the main traffic routes by completely clustering a maritime data set.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andrienko, G., Andrienko, N., Jankowski, P., Keim, D., Kraak, M.J., MacEachren, A., Wrobel, S.: Geovisual analytics for spatial decision support: Setting the research agenda. Int. J. Geogr. Inf. Sci. 21(8), 839–857 (2007), http://dx.doi.org/10.1080/13658810701349011

    Article  Google Scholar 

  2. Antunes, A., Santos, M.Y., Moreira, A.: Fast snn-based clustering approach for large geospatial data sets. In: Huerta, J., Schade, S., Granell, C. (eds.) Proceedings of the 17th AGILE Conference on Geographic Information Science. Springer, Castellón (2014)

    Google Scholar 

  3. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975), http://doi.acm.org/10.1145/361002.361007

    Article  MATH  MathSciNet  Google Scholar 

  4. Bhavsar, H.B., Jivani, A.G.: The shared nearest neighbor algorithm with enclosures (SNNAE). In: 2009 WRI World Congress on Computer Science and Information Engineering, vol. 4, pp. 436–442. IEEE (April 2009)

    Google Scholar 

  5. de Campos, L.M., Fernández-Luna, J.M., Huete, J.F.: Bayesian networks and information retrieval: an introduction to the special issue. Information Processing & Management 40(5), 727–733 (2004), http://www.sciencedirect.com/science/article/pii/S0306457304000159

    Article  Google Scholar 

  6. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001), http://doi.acm.org/10.1145/502807.502808

    Article  Google Scholar 

  7. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience (October 2000)

    Google Scholar 

  8. Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of Second SIAM International Conference on Data Mining (2003), http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.125.9670

  9. Ferret, O.: Finding document topics for improving topic segmentation (2007), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.5609

  10. Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977), http://doi.acm.org/10.1145/355744.355745

    Article  MATH  Google Scholar 

  11. Ge, Y., Xiong, H., Zhou, W., Li, S., Sahoo, R.: Multifocal learning for customer problem analysis. ACM Trans. Intell. 24, 24:1–24:22 (2011), http://doi.acm.org/10.1145/1961189.1961196

    Article  Google Scholar 

  12. Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared near neighbors. IEEE Transactions on Computers C-22(11), 1025–1034 (1973)

    Article  Google Scholar 

  13. Karypis, G., Han, E.H., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999)

    Article  Google Scholar 

  14. Marshall, D.: Nearest neighbour searching in high dimensional metric space. Tech. rep. (2006)

    Google Scholar 

  15. Moreira, A., Santos, M.Y., Carneiro, S.: Density-based clustering algorithms DBSCAN and SNN (July 2005), http://get.dsi.uminho.pt/local/download/SNN&DBSCAN.pdf

  16. Moreira, G., Santos, M.Y., Moura-Pires, J.: Snn input parameters: How are they related? In: Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2013). IEEE Computer Society, Seoul (2013)

    Google Scholar 

  17. Santos, M.Y., Silva, J.P., Moura-Pires, J., Wachowicz, M.: Automated traffic route identification through the shared nearest neighbour algorithm. In: Gensel, J., Josselin, D., Vandenbroucke, D. (eds.) Bridging the Geographic Information Sciences, pp. 231–248. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  18. Shencottah, K.N.: Finding Clusters in Spatial Data. MSc thesis, University of Cincinnati (2007), http://etd.ohiolink.edu/view.cgi?acc_num=ucin1179521337

  19. Twitter: REST API resources (2012), https://dev.twitter.com/docs/api

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Faustino, B.F., Moura-Pires, J., Santos, M.Y., Moreira, G. (2014). kd-SNN: A Metric Data Structure Seconding the Clustering of Spatial Data. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol 8579. Springer, Cham. https://doi.org/10.1007/978-3-319-09144-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09144-0_22

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09143-3

  • Online ISBN: 978-3-319-09144-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics