Survey on Clustering Algorithms for Unstructured Data

Patibandla, R. S. M. Lakshmi; Veeranjaneyulu, N.

doi:10.1007/978-981-10-7566-7_41

R. S. M. Lakshmi Patibandla¹⁸ &
N. Veeranjaneyulu¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 695))

1948 Accesses
12 Citations

Abstract

In modern applications, clustering algorithms have been emerged learning aid to generate and analyze the huge volumes of data. The foremost clustering objective is to classify same type of data has been grouped with in the same Cluster while they are similar according to precise metrics. For various applications, clustering is one of the techniques to classify and analyze the large amount of data. On the other hand, the main issues of applying clustering algorithms for big data that causes uncertainty among the practitioners require consent in the definition of their properties in addition to be deficient in proper classification. In this paper, we studied various existing clustering methods which are suitable for large, semi-structured, and unstructured data and how we can apply same algorithms in distributed environment/hadoop.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Madhuri, R., RamakrishnaMurty, M., Murthy, J.V.R., Prasad Reddy, P.V.G.D., et al.: Cluster analysis on different data sets using K-modes and K-prototype algorithms. In: International Conference and Published The Proceeding in AISC and Computing, pp. 137–144. Springer (2014)
Google Scholar
Schmidt, S.: Data is exploding: the 3 V’s of big data. Business Computing World (2012)
Google Scholar
RamakrishnaMurty, M., Murthy, J.V.R., Prasad Reddy, P.V.G.D., Sapathy, S.C.: A survey of Cross-Domain text categorization techniques. In: International Conference on Recent Advances in Information Technology RAIT-2012 IEEE Xplorer Proceedings (2012), 978-1-4577-0697-4/12
Google Scholar
RamakrishnaMurty, M., Murthy, J.V.R., Prasad Reddy, P.V.G.D., et al.: Homogeneity separateness: a new validity measure for clustering problems. In: International Conference and Published The Proceedings in AISC and Computing, pp. 1–10. Springer (2014)
Google Scholar
Zhai, Y., Ong, Y.-S., Tsang, I.W.: The emerging big dimensionality. In: Proceedings of the 22nd International Conference on World Wide Web Companion, Computational Intelligence Magazine, pp. 14–26. IEEE (2014)
Google Scholar
Medvedev, V., Dzemyda, G., Kurasova, O., Marcinkeviˇcius, V.: Efficient data projection for visual analysis of large data sets using neural networks. Informatica, 507–520 (2011)
Google Scholar
Dzemyda, G., Kurasova, O., Medvedev, V.: Dimension reduction and data visualization using neural networks. In: Maglogiannis, I., Karpouzis, K., Wallace, M., Soldatos, J. (eds.): Emerging Artificial Intelligence Applications in Computer Engineering, pp. 25–49 (2007)
Google Scholar
Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice Hall PTR, Upper Saddle River, USA (2002)
Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds.): Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley and Los Angeles, USA, pp. 281–297 (1967)
Google Scholar
Kohonen, T.: Overture. In: Self-Organizing Neural Networks: Recent Advances and Applications, pp. 1–12. Springer, New York, USA (2002)
MATH Google Scholar
Dhillon, I., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceeding KDD 2004 Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–556 (2004)
Google Scholar
Dhillon, I., Modha, D.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 143–175 (2001)
Google Scholar
de Amorim, R.C., Mirkin, B.: Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering. Pattern Recognit. 1061–1075, (2012)
Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms, Kluwer Academic Publishers (1981)
Google Scholar
Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, pp. 727–734 (2000)
Google Scholar
Cai, X., Nie, F., Huang, H.: Multi-view k-means clustering on big data. In: Rossi, F. (ed.): Proceedings of the 23rd International Joint Conference on Artificial Intelligence, IJCAI 2013, IJCAI/AAAI (2013)
Google Scholar
Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. In: Proceedings of 23rd Annual Conference on Neural Information Processing Systems, NIPS, pp. 10–18 (2009)
Google Scholar
Braverman, V., Meyerson, A., Ostrovsky, R., Roytman, A., Shindler, M., Tagiku, B.: Streaming k-means on well-clusterable data. In: Randall, D. (ed.): Proceedings of the Twenty-Second Annual ACM-SIAM SODA, pp. 26–40 (2011)
Chapter Google Scholar
Shindler, M., Wong, A., Meyerson, A.: Fast and accurate k-means for large datasets. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.): Proceedings of 25th Annual Conference on Neural Information Processing Systems pp. 2375–2383 (2011)
Google Scholar
McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn, Wiley series in probability and statistics (2008)
Google Scholar
Abimbola, A.A., Omidiora, E.O., Olabiyisi, S.O.: An exploratory study of k-means and expectation maximization algorithms. Br. J. Math. Comput. Sci. 62–71 (2012)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. SIGMOD 1996, pp. 103–114. ACM, New York, USA (1996)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. Inf. Syst. 35–58 (2001)
Article Google Scholar
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM (1999)
Google Scholar
David, L., Daniel, B.: Clustering very large datasets using a low memory matrix factored representation. Comput. Intell. 114–135 (2009)
Google Scholar
Dzemyda, G., Kurasova, O., Zilinskas, J.: Multidimensional Data Visualization: Methods and Applications, Springer Optimization and Its Applications. Springer (2013)
Google Scholar
Hammer, B., Micheli, A., Sperduti, A., Strickert, M.: A general framework for unsupervised processing of structured data. Neurocomputing, 3–35 (2004)
Article Google Scholar
Voegtlin, T.: Recursive self-organizing maps. Neural Netw. 979–991 (2002)
Article Google Scholar
Lagus, K., Kaski, S., Kohonen, T.: Mining massive document collections by the WEBSOM method. Inf. Sci. 135–156 (2004)
Article Google Scholar
Stefanoviˇc, P., Kurasova, O.: Visual analysis of self-organizing maps. In: Nonlinear Analysis: Modelling and Control, pp. 488–504 (2011)
Google Scholar
Kurasova, O., Marcinkeviˇcius, V., Medvedev, V., Rapeˇcka, A., Stefanoviˇc, P.: Strategies for big data clustering. In: IEEE 26th International Conference on Tools with Artificial Intelligence, pp. 740–747 (2014)
Google Scholar
Nandakumar, A.N., Yambem, N.: A survey on data mining algorithms on apache hadoop platform. IJETAE, 563–565 (2014)
Google Scholar
Veeranjaneyulu, N., NirupamaBhat, M., Raghunadh, A.: Approaches for managing and analyzing unstructured data. IJCSE, 19–24 (2014)
Google Scholar
Jaatun, M.G., Zhao, G., Rong, C. (eds.): Parallel K-Means clustering based on MapReduce. In: CloudCom 2009, LNCS 5931, pp. 674–679 (2009)
Google Scholar
Lin, J., Dyer, C.: Data-Intensive Text Processing with MapReduce (2010)
Article Google Scholar
Wang, F.L., et al., (eds.): Parallel K-Means clustering of remote sensing images based on MapReduce. In: WISM 2010, LNCS 6318, pp. 162–170 (2010)
Google Scholar
Sun, Z.: Study on Parallel SVM Based on MapReduce. In: Conference on WorldComp (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, Vignan’s Foundation for Science, Technology & Research, Vadlamudi, Andhra Pradesh, India
R. S. M. Lakshmi Patibandla
Department of IT, VFSTR University, Vadlamudi, Andhra Pradesh, India
N. Veeranjaneyulu

Authors

R. S. M. Lakshmi Patibandla
View author publications
You can also search for this author in PubMed Google Scholar
N. Veeranjaneyulu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. S. M. Lakshmi Patibandla .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, SRMGPC, Lucknow, Uttar Pradesh, India
Vikrant Bhateja
Departamento de Computación, CINVESTAV-IPN, Mexico City, Mexico
Carlos A. Coello Coello
Department of Computer Science and Engineering, PVP Siddhartha Institute of Technology, Vijayawada, Andhra Pradesh, India
Suresh Chandra Satapathy
School of Computer Engineering, KIIT University, Bhubaneswar, Odisha, India
Prasant Kumar Pattnaik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patibandla, R.S.M.L., Veeranjaneyulu, N. (2018). Survey on Clustering Algorithms for Unstructured Data. In: Bhateja, V., Coello Coello, C., Satapathy, S., Pattnaik, P. (eds) Intelligent Engineering Informatics. Advances in Intelligent Systems and Computing, vol 695. Springer, Singapore. https://doi.org/10.1007/978-981-10-7566-7_41

Download citation

DOI: https://doi.org/10.1007/978-981-10-7566-7_41
Published: 11 April 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7565-0
Online ISBN: 978-981-10-7566-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics