Abstract
Spectral clustering is one of the most popular modern clustering techniques for conventional data. However, the application of the general spectral clustering method in the geostatistical data framework poses a double challenge. Firstly, applied to geostatistical data, the general spectral clustering method produces clusters that are spatially non-contiguous which is undesirable for many geoscience applications. Secondly, it is limited in its applicability to large-scale problems due to its high computational complexity. This paper presents a spectral clustering method dedicated to large-scale geostatistical datasets in which spatial dependence plays an important role. It extends a previous work to large-scale geostatistical datasets by computing the similarity matrix only at a reduced set of locations over the study domain referred to as anchor locations. It has the advantage of using all data during the computation of the similarity matrix at anchor locations; so there is no sacrifice of data. The spectral clustering algorithm can then be efficiently performed on this similarity matrix at anchor locations rather than all data locations. Given the resulting cluster labels of anchor locations, a weighted k-nearest-neighbour classifier is trained using their geographical coordinates as covariates and their cluster labels as the response. The assignment of clustering membership to the entire data locations is obtained by applying the trained classifier. The effectiveness of the proposed method to discover spatially contiguous and meaningful clusters in large-scale geostatistical datasets is illustrated using the US National Geochemical Survey database.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Cai, D., Chen, X.: Large scale spectral clustering via landmark-based sparse representation. IEEE Trans. Cybern. 45(8), 1669–1680 (2015)
Cao, Y., Chen, D.R.: Consistency of regularized spectral clustering. Appl. Comput. Harmonic Anal. 30(3), 319–336 (2011)
Charu, C., Chandan, K.: Data Clustering: Algorithms and Applications. Chapman and Hall/CRC (2013)
Chen, B., Gao, B., Liu, T.-Y., Chen, Y.-F., Ma, W.-Y.: Fast spectral clustering of data using sequential matrix compression. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS, vol. 4212, pp. 590–597. Springer, Heidelberg (2006). doi:10.1007/11871842_56
Chen, X., Cai, D.: Large scale spectral clustering with landmark-based representation. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, pp. 313–318. AAAI Press (2011)
Chilès, J.P., Delfiner, P.: Geostatistics: Modeling Spatial Uncertainty. Wiley, NJ (2012)
Choromanska, A., Jebara, T., Kim, H., Mohan, M., Monteleoni, C.: Fast spectral clustering via the Nyström method. In: Jain, S., Munos, R., Stephan, F., Zeugmann, T. (eds.) ALT 2013. LNCS, vol. 8139, pp. 367–381. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40935-6_26
Filippone, M., Camastra, F., Masulli, F., Rovetta, S.: A survey of kernel and spectral methods for clustering. Pattern Recogn. 41(1), 176–190 (2008)
Fouedjio, F.: A clustering approach for discovering intrinsic clusters in multivariate geostatistical data. In: Perner, P. (ed.) MLDM 2016. LNCS, vol. 9729, pp. 491–500. Springer, Cham (2016)
Fouedjio, F.: Discovering spatially contiguous clusters in multivariate geostatistical data through spectral clustering. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q.Z. (eds.) ADMA 2016. LNCS (LNAI), vol. 10086, pp. 547–557. Springer, Cham (2016). doi:10.1007/978-3-319-49586-6_38
Fouedjio, F.: A hierarchical clustering method for multivariate geostatistical data. Spat. Stat. 18, 334–351 (2016)
Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nyström method. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 214–225 (2004)
Grossman, J.N., Grosz, A., Schweitzer, P.N., Schruben, P.G.: The national geochemical survey - database and documentation. Version 5. U.S. geological Survey, Reston, VA (2008)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York (2009)
Hechenbichler, K., Schliep, K.: Weighted k-nearest-neighbor techniques and ordinal classification. Discussion Paper 399, SFB 386, Ludwig-Maximilians University Munich (2004)
Kannan, R., Vempala, S., Vetta, A.: On clusterings: Good, bad and spectral. J. ACM 51(3), 497–515 (2004)
Khoa, N.L.D., Chawla, S.: Large scale spectral clustering using resistance distance and spielman-teng solvers. In: Ganascia, J.-G., Lenca, P., Petit, J.-M. (eds.) DS 2012. LNCS, vol. 7569, pp. 7–21. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33492-4_4
Kong, T., Tian, Y., Shen, H.: A fast incremental spectral clustering for large data sets. In: 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 1–5. IEEE (2011)
Luxburg, U.V.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Luxburg, U.V., Belkin, M., Bousquet, O.: Consistency of spectral clustering. Ann. Statist. 36(2), 555–586 (2008)
Luxburg, U.V., Bousquet, O., Belkin, M.: Limits of spectral clustering. In: Advances in Neural Information Processsing Systems, pp. 857–864 (2004)
Nascimento, M.C., de Carvalho, A.C.: Spectral methods for graph clustering a survey. Eur. J. Oper. Res. 211(2), 221–231 (2011)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processsing Systems, pp. 849–856. MIT Press (2001)
Romary, T., Ors, F., Rivoirard, J., Deraisme, J.: Unsupervised classification of multivariate geostatistical data: two algorithms. Comput. Geosci. 85(Pt. B), 96–103 (2015)
Schaeffer, S.E.: Graph clustering. Comput. Sci. Rev. 1(1), 27–64 (2007)
Semertzidis, T., Rafailidis, D., Strintzis, M., Daras, P.: Large-scale spectral clustering based on pairwise constraints. Inform. Process. Manage. 51(5), 616–624 (2015)
Shinnou, H., Sasaki, M.: Spectral clustering for a large data set by reducing the similarity matrix size. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008) (2008)
Tremblay, N., Puy, G., Gribonval, R., Vandergheynst, P.: Compressive spectral clustering. In: Proceedings of the 33rd International Conference on Machine Learning (ICML 2016) (2016)
Vladymyrov, M., Carreira-Perpiñán, M.: The variational Nyström method for large-scale spectral problems. In: Proceedings of the 33rd International Conference on Machine Learning (ICML 2016) (2016)
Wackernagel, H.: Multivariate Geostatistics: An Introduction with Applications. Springer, Heidelberg (2003)
Wand, M., Jones, C.: Kernel Smoothing. Monographs on Statistics and Applied Probability. Chapman and Hall, Sanford (1995)
Wang, C.: Large-scale spectral clustering on graphs. In: IJCAI. Elsevier (2013)
Wang, L., Leckie, C., Ramamohanarao, K., Bezdek, J.: Approximate Spectral Clustering, pp. 134–146. Springer, Heidelberg (2009)
Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–916. ACM (2009)
Zha, H., He, X., Ding, C., Gu, M., Simon, H.D.: Spectral relaxation for k-means clustering. In: Advances in Neural Information Processsing Systems, pp. 1057–1064 (2001)
Zhang, X., Zong, L., You, Q., Yong, X.: Sampling for Nyström extension-based spectral clustering: incremental perspective and novel analysis. ACM Trans. Knowl. Discov. Data 11(1), 7:1–7:25 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Fouedjio, F. (2017). A Spectral Clustering Method for Large-Scale Geostatistical Datasets. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-62416-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62415-0
Online ISBN: 978-3-319-62416-7
eBook Packages: Computer ScienceComputer Science (R0)