Abstract
In modern applications, clustering algorithms have been emerged learning aid to generate and analyze the huge volumes of data. The foremost clustering objective is to classify same type of data has been grouped with in the same Cluster while they are similar according to precise metrics. For various applications, clustering is one of the techniques to classify and analyze the large amount of data. On the other hand, the main issues of applying clustering algorithms for big data that causes uncertainty among the practitioners require consent in the definition of their properties in addition to be deficient in proper classification. In this paper, we studied various existing clustering methods which are suitable for large, semi-structured, and unstructured data and how we can apply same algorithms in distributed environment/hadoop.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Madhuri, R., RamakrishnaMurty, M., Murthy, J.V.R., Prasad Reddy, P.V.G.D., et al.: Cluster analysis on different data sets using K-modes and K-prototype algorithms. In: International Conference and Published The Proceeding in AISC and Computing, pp. 137–144. Springer (2014)
Schmidt, S.: Data is exploding: the 3 V’s of big data. Business Computing World (2012)
RamakrishnaMurty, M., Murthy, J.V.R., Prasad Reddy, P.V.G.D., Sapathy, S.C.: A survey of Cross-Domain text categorization techniques. In: International Conference on Recent Advances in Information Technology RAIT-2012 IEEE Xplorer Proceedings (2012), 978-1-4577-0697-4/12
RamakrishnaMurty, M., Murthy, J.V.R., Prasad Reddy, P.V.G.D., et al.: Homogeneity separateness: a new validity measure for clustering problems. In: International Conference and Published The Proceedings in AISC and Computing, pp. 1–10. Springer (2014)
Zhai, Y., Ong, Y.-S., Tsang, I.W.: The emerging big dimensionality. In: Proceedings of the 22nd International Conference on World Wide Web Companion, Computational Intelligence Magazine, pp. 14–26. IEEE (2014)
Medvedev, V., Dzemyda, G., Kurasova, O., Marcinkeviˇcius, V.: Efficient data projection for visual analysis of large data sets using neural networks. Informatica, 507–520 (2011)
Dzemyda, G., Kurasova, O., Medvedev, V.: Dimension reduction and data visualization using neural networks. In: Maglogiannis, I., Karpouzis, K., Wallace, M., Soldatos, J. (eds.): Emerging Artificial Intelligence Applications in Computer Engineering, pp. 25–49 (2007)
Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice Hall PTR, Upper Saddle River, USA (2002)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds.): Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley and Los Angeles, USA, pp. 281–297 (1967)
Kohonen, T.: Overture. In: Self-Organizing Neural Networks: Recent Advances and Applications, pp. 1–12. Springer, New York, USA (2002)
Dhillon, I., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceeding KDD 2004 Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–556 (2004)
Dhillon, I., Modha, D.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 143–175 (2001)
de Amorim, R.C., Mirkin, B.: Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering. Pattern Recognit. 1061–1075, (2012)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms, Kluwer Academic Publishers (1981)
Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, pp. 727–734 (2000)
Cai, X., Nie, F., Huang, H.: Multi-view k-means clustering on big data. In: Rossi, F. (ed.): Proceedings of the 23rd International Joint Conference on Artificial Intelligence, IJCAI 2013, IJCAI/AAAI (2013)
Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. In: Proceedings of 23rd Annual Conference on Neural Information Processing Systems, NIPS, pp. 10–18 (2009)
Braverman, V., Meyerson, A., Ostrovsky, R., Roytman, A., Shindler, M., Tagiku, B.: Streaming k-means on well-clusterable data. In: Randall, D. (ed.): Proceedings of the Twenty-Second Annual ACM-SIAM SODA, pp. 26–40 (2011)
Shindler, M., Wong, A., Meyerson, A.: Fast and accurate k-means for large datasets. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.): Proceedings of 25th Annual Conference on Neural Information Processing Systems pp. 2375–2383 (2011)
McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn, Wiley series in probability and statistics (2008)
Abimbola, A.A., Omidiora, E.O., Olabiyisi, S.O.: An exploratory study of k-means and expectation maximization algorithms. Br. J. Math. Comput. Sci. 62–71 (2012)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. SIGMOD 1996, pp. 103–114. ACM, New York, USA (1996)
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. Inf. Syst. 35–58 (2001)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM (1999)
David, L., Daniel, B.: Clustering very large datasets using a low memory matrix factored representation. Comput. Intell. 114–135 (2009)
Dzemyda, G., Kurasova, O., Zilinskas, J.: Multidimensional Data Visualization: Methods and Applications, Springer Optimization and Its Applications. Springer (2013)
Hammer, B., Micheli, A., Sperduti, A., Strickert, M.: A general framework for unsupervised processing of structured data. Neurocomputing, 3–35 (2004)
Voegtlin, T.: Recursive self-organizing maps. Neural Netw. 979–991 (2002)
Lagus, K., Kaski, S., Kohonen, T.: Mining massive document collections by the WEBSOM method. Inf. Sci. 135–156 (2004)
Stefanoviˇc, P., Kurasova, O.: Visual analysis of self-organizing maps. In: Nonlinear Analysis: Modelling and Control, pp. 488–504 (2011)
Kurasova, O., Marcinkeviˇcius, V., Medvedev, V., Rapeˇcka, A., Stefanoviˇc, P.: Strategies for big data clustering. In: IEEE 26th International Conference on Tools with Artificial Intelligence, pp. 740–747 (2014)
Nandakumar, A.N., Yambem, N.: A survey on data mining algorithms on apache hadoop platform. IJETAE, 563–565 (2014)
Veeranjaneyulu, N., NirupamaBhat, M., Raghunadh, A.: Approaches for managing and analyzing unstructured data. IJCSE, 19–24 (2014)
Jaatun, M.G., Zhao, G., Rong, C. (eds.): Parallel K-Means clustering based on MapReduce. In: CloudCom 2009, LNCS 5931, pp. 674–679 (2009)
Lin, J., Dyer, C.: Data-Intensive Text Processing with MapReduce (2010)
Wang, F.L., et al., (eds.): Parallel K-Means clustering of remote sensing images based on MapReduce. In: WISM 2010, LNCS 6318, pp. 162–170 (2010)
Sun, Z.: Study on Parallel SVM Based on MapReduce. In: Conference on WorldComp (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Patibandla, R.S.M.L., Veeranjaneyulu, N. (2018). Survey on Clustering Algorithms for Unstructured Data. In: Bhateja, V., Coello Coello, C., Satapathy, S., Pattnaik, P. (eds) Intelligent Engineering Informatics. Advances in Intelligent Systems and Computing, vol 695. Springer, Singapore. https://doi.org/10.1007/978-981-10-7566-7_41
Download citation
DOI: https://doi.org/10.1007/978-981-10-7566-7_41
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7565-0
Online ISBN: 978-981-10-7566-7
eBook Packages: EngineeringEngineering (R0)