A Geospatial Implementation of a Novel Delineation Clustering Algorithm Employing the K-means

Oyana, Tonny J.; Scott, Kara E.

doi:10.1007/978-3-540-78946-8_8

A Geospatial Implementation of a Novel Delineation Clustering Algorithm Employing the K-means

Tonny J. Oyana⁴ &
Kara E. Scott⁴

Chapter

737 Accesses
2 Citations

Part of the book series: Lecture Notes in Geoinformation and Cartography ((LNGC))

The overarching objective of this paper is to introduce a novel Fast, Efficient, and Scalable k-means k-means (FES-k-means*) algorithm. This algorithm is designed to increase the overall performance of the standard k-means clustering technique. The FES-k-means* algorithm uses a hybrid approach that comprises the k-d tree data structure, the nearest neighbor query, the standard k-means algorithm, and Mashor’s adaptation rate. The algorithm is tested using two real datasets and two synthetic datasets and is employed twice on all four datasets. The first trial consisted of previously MIL-SOM* trained data, and the second was on raw, untrained data. The approach presented with this method enables unfounded knowledge discovery, otherwise unclaimed by conventional clustering methods. When used in conjunction with the MIL-SOM* training technique, theFES-k-means* algorithm reduces the computation time and produces quality clusters. In particular, the robust FES-k-means* method opens doors to (1) faster cluster production than conventional clustering methods, (2) scalability allowing application in other platforms, and its ability to handle small and large datasets, compact or scattered, and (3) efficient geospatial data analysis of large datasets. All of the above makes FES-k-means* live up to defending its well-deserved name—Fast, Efficient, and Scalable k-means (FES-k-means*). The findings of this study are vital to the relatively new and expanding subfield of geospatial data management.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alsabti, K., S. Ranka, V. Singh. (1998). An efficient k-means clustering algorithm. Proceedings in IPPS: 11th International Parallel Processing Symposium Workshop on High Performance Data Mining, IEEE, Computer Society Press.
Google Scholar
Bacć ´o, F., Lobo, V., and M. Painho. (2004) Geo-Self-Organizing Map (Geo-SOM) for Building and Exploring Homogeneous Regions. In Egenhofer, M.J., Freskes, C. and Miller, H.J., (eds.) Geographical Information Science. Proceedings of Third International Conference, GIScience. Adelphi, MD, USA. October 20–23, 2004. Springer-Verlay Berlin Heidelberg.
Google Scholar
Ball, G. H., and D. J. Hall. (1967) A clustering technique for summarizing multivariate data. Behavioral Sciences. 12(2):153–55.
Article Google Scholar
Birant, D. and A. Kut. (2007) ST-DBSCAN: An algorithm for clustering spatial-temporal data. Data & Knowledge Engineering. Elsevier. 601(1): 208–221.
Google Scholar
Chinrungrueng C., and C. H. Sequin. (1995) Optimal adaptive k-means algorithm with dynamic adjustment of learning rate. IEEE Transactions on neural networks. 6(1): 157–69.
Article Google Scholar
Ester, M., H.P. Kriegel, J. Sander, and X. Xu. (1996). A density based algorithm for discovering clusters in large geospatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon.
Google Scholar
Guha, S., R. Rastogi, and K. Shim. (1998) Cure: An efficient clustering algorithm for large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data (SIGMOD, Seattle, WA, June). ACM Press, New York, NY.
Google Scholar
Jain, A.K., M.N. Murty, and P.J. Flynn. (1999). Data clustering: a review. ACM Computing Surveys. 31(3): 264–323.
Article Google Scholar
Kanungo, T., D.M. Mount, N. S. Netanyahu, et al. (2002). An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on pattern analysis and machine intelligence. 24(7): 881–92.
Article Google Scholar
Karypis, G., E.H Han, V. Kumar. (1999) Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer Special Issue on Data Analysis and Mining. 32(8): 68–75.
Google Scholar
Kim, K. and E. Yamashita (2007)Using a k-means clustering algorithm to examine patterns of pedestrian involved crashes in Honolulu, Hawaii. Journal of Advanced Transportation. 41(1): 60–89.
Article Google Scholar
Kohonen, T. (1990) The self-organizing map. Proceedings of the IEEE. 78(9): 1464–80.p
Google Scholar
Krishna, K. and M. Narasimha. (1999) Genetic k-means algorithm. IEEE Transactions on Systems, Man and Cybernetics, Part B. 29(3): 433–439.
Article Google Scholar
Kruskal, J.B. (1964) Non-metric multidimensional scaling: a numerical method. Psychometrika 29: 115–29.
Article Google Scholar
Likas A., N. Vlassis, and J.J. Verbeek. (2003) The global k-means clustering algorithm. Pattern Recognition. 36: 451–61.
Article Google Scholar
MacQueen, J.B. (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Statistical Laboratory, University of California. 282–97.
Google Scholar
Mashor, M.Y. (1998) Improving the performance of k-means clustering algorithm to position the centers of RBF network. International Journal of the Computer, The Internet and Management. 6(2).
Google Scholar
Ng, R. and J. Han. (1994) Efficient and Effective Clustering methods for spatial data mining. In Proceedings of the 20th International Conference of ery Large Data Bases (VLDB), September 1994, Santiago, Chile. 144–55.
Google Scholar
Oyana TJ, Achenie LEK, Cuadros-Vargas E, Rivers PA, and Scott KE. (2006) A Mathematical Improvement of the Self-Organizing Map Algorithm. Chapter 8: ICT and Mathematical Modelling (pp 522–531). In Mwakali J.A. and Taban-Wani G. (eds.): Advance in Engineering and Technology, London: Elsevier Ltd. 847.
Google Scholar
Pham D.T., S.S. Dimov, C.D. Nguyen. (2004) A two-phase k-means algorithm for large datasets. Proc. Instn Mech. Engrs, Part C: Journal Mechanical Engineering Science. 218(C). 1269–73.
Google Scholar
Roussopoulos N., S. Kelley, F. Vincent. (1995) Nearest Neighbor Queries. In Proceedings of Special Interest Group on Management of data (SIGMOD), San Jose, CA, USA. 71–79.
Google Scholar
Sammon, J. W. (1969) A nonlinear mapping for data structure analysis. IEEE Transactions on Computers. 18(5): 401–09.
Article Google Scholar
Song, M. and S. Rajasekaran. (2005) Finding frequent item sets by transaction mapping. Symposium on Applied Computing. 488–492.
Google Scholar
Steinley, D. (2006) K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology. 59: 1–34.
Article Google Scholar
Su, M. and H. T. Chang. (2000) Fast self-organizing feature map algorithm. IEEE transactions on neural network. 11(3): 721–33.
Article Google Scholar
Theodoridis, S. and K. Koutroumbas. (2003) Pattern Recognition (2ed). Elsevier Science (USA). China machine press. 431–535.
Google Scholar
Tsai, C.F., C.W. Tsai, H.C. Wu, and T. Yang.(2004) ACODF: a novel data clustering approach for data mining in large databases. Journal of Systems and Software. 73(1): 133–145.
Article Google Scholar
Vesanto, J. and E. Alhoniemi. (2000) Clustering of the Self-Organizing Map. IEEE Transactions on Neural Networks. 11(3): 586–600.
Article Google Scholar
Vrahatis, M. N., B. Boutsinas, P. Alevizos, and G. Pavlides. (2002) The new k-windows algorithm for improving the k-means clustering algorithm. Journal of Complexity. 18. 375–91.
Article Google Scholar
Wang, L., L. Bo, L. Jiao. (2006) A modified k-means clustering with a density-sensitive distance metric. Lecture notes in Computer Science. Springer Berlin , Heidelber. 4062: 544–551.
Google Scholar
Xu, R and D. Wunsch II. (2005) Survey of Clustering Algorithms. IEEE Transactions on Neural Networks. 16(3): 645–78.
Article Google Scholar
Zhang T., R. Ramakrishnan, M. Livny. (1996) BIRCH: An efficient data clustering method for very large databases. Proc. ACM SIGMOD Conf. Management of Data. 103–14.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Geography and Environmental Resources, Southern Illinois University, 1000 Faner Drive, MC 4514, Carbondale, IL 62901-4514, USA
Tonny J. Oyana & Kara E. Scott

Authors

Tonny J. Oyana
View author publications
You can also search for this author in PubMed Google Scholar
Kara E. Scott
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Technische Universität Dresden, Helmholtzstrasse 10, 01069 Dresden, Germany
Lars Bernard (Prof) (Prof)
European Commission Joint Research Centre, Institute for Environment and Sustainability, TP 262 21027 Ispra(VA), Italy
Anders Friis-Christensen
University of Applied Studies and Research Harz, Friedrich Str. 57-59, 38855 Wernigerode, Germany
Hardy Pundt (Prof) (Prof)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Oyana, T.J., Scott, K.E. (2008). A Geospatial Implementation of a Novel Delineation Clustering Algorithm Employing the K-means . In: Bernard, L., Friis-Christensen, A., Pundt, H. (eds) The European Information Society. Lecture Notes in Geoinformation and Cartography. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78946-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-78946-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78945-1
Online ISBN: 978-3-540-78946-8
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics

Buying options