Skip to main content

Survey on Clustering Algorithms for Unstructured Data

  • Conference paper
  • First Online:
Intelligent Engineering Informatics

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 695))

Abstract

In modern applications, clustering algorithms have been emerged learning aid to generate and analyze the huge volumes of data. The foremost clustering objective is to classify same type of data has been grouped with in the same Cluster while they are similar according to precise metrics. For various applications, clustering is one of the techniques to classify and analyze the large amount of data. On the other hand, the main issues of applying clustering algorithms for big data that causes uncertainty among the practitioners require consent in the definition of their properties in addition to be deficient in proper classification. In this paper, we studied various existing clustering methods which are suitable for large, semi-structured, and unstructured data and how we can apply same algorithms in distributed environment/hadoop.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Madhuri, R., RamakrishnaMurty, M., Murthy, J.V.R., Prasad Reddy, P.V.G.D., et al.: Cluster analysis on different data sets using K-modes and K-prototype algorithms. In: International Conference and Published The Proceeding in AISC and Computing, pp. 137–144. Springer (2014)

    Google Scholar 

  2. Schmidt, S.: Data is exploding: the 3 V’s of big data. Business Computing World (2012)

    Google Scholar 

  3. RamakrishnaMurty, M., Murthy, J.V.R., Prasad Reddy, P.V.G.D., Sapathy, S.C.: A survey of Cross-Domain text categorization techniques. In: International Conference on Recent Advances in Information Technology RAIT-2012 IEEE Xplorer Proceedings (2012), 978-1-4577-0697-4/12

    Google Scholar 

  4. RamakrishnaMurty, M., Murthy, J.V.R., Prasad Reddy, P.V.G.D., et al.: Homogeneity separateness: a new validity measure for clustering problems. In: International Conference and Published The Proceedings in AISC and Computing, pp. 1–10. Springer (2014)

    Google Scholar 

  5. Zhai, Y., Ong, Y.-S., Tsang, I.W.: The emerging big dimensionality. In: Proceedings of the 22nd International Conference on World Wide Web Companion, Computational Intelligence Magazine, pp. 14–26. IEEE (2014)

    Google Scholar 

  6. Medvedev, V., Dzemyda, G., Kurasova, O., Marcinkeviˇcius, V.: Efficient data projection for visual analysis of large data sets using neural networks. Informatica, 507–520 (2011)

    Google Scholar 

  7. Dzemyda, G., Kurasova, O., Medvedev, V.: Dimension reduction and data visualization using neural networks. In: Maglogiannis, I., Karpouzis, K., Wallace, M., Soldatos, J. (eds.): Emerging Artificial Intelligence Applications in Computer Engineering, pp. 25–49 (2007)

    Google Scholar 

  8. Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice Hall PTR, Upper Saddle River, USA (2002)

    Google Scholar 

  9. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds.): Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley and Los Angeles, USA, pp. 281–297 (1967)

    Google Scholar 

  10. Kohonen, T.: Overture. In: Self-Organizing Neural Networks: Recent Advances and Applications, pp. 1–12. Springer, New York, USA (2002)

    MATH  Google Scholar 

  11. Dhillon, I., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceeding KDD 2004 Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–556 (2004)

    Google Scholar 

  12. Dhillon, I., Modha, D.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 143–175 (2001)

    Google Scholar 

  13. de Amorim, R.C., Mirkin, B.: Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering. Pattern Recognit. 1061–1075, (2012)

    Google Scholar 

  14. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms, Kluwer Academic Publishers (1981)

    Google Scholar 

  15. Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, pp. 727–734 (2000)

    Google Scholar 

  16. Cai, X., Nie, F., Huang, H.: Multi-view k-means clustering on big data. In: Rossi, F. (ed.): Proceedings of the 23rd International Joint Conference on Artificial Intelligence, IJCAI 2013, IJCAI/AAAI (2013)

    Google Scholar 

  17. Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. In: Proceedings of 23rd Annual Conference on Neural Information Processing Systems, NIPS, pp. 10–18 (2009)

    Google Scholar 

  18. Braverman, V., Meyerson, A., Ostrovsky, R., Roytman, A., Shindler, M., Tagiku, B.: Streaming k-means on well-clusterable data. In: Randall, D. (ed.): Proceedings of the Twenty-Second Annual ACM-SIAM SODA, pp. 26–40 (2011)

    Chapter  Google Scholar 

  19. Shindler, M., Wong, A., Meyerson, A.: Fast and accurate k-means for large datasets. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.): Proceedings of 25th Annual Conference on Neural Information Processing Systems pp. 2375–2383 (2011)

    Google Scholar 

  20. McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn, Wiley series in probability and statistics (2008)

    Google Scholar 

  21. Abimbola, A.A., Omidiora, E.O., Olabiyisi, S.O.: An exploratory study of k-means and expectation maximization algorithms. Br. J. Math. Comput. Sci. 62–71 (2012)

    Google Scholar 

  22. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. SIGMOD 1996, pp. 103–114. ACM, New York, USA (1996)

    Google Scholar 

  23. Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. Inf. Syst. 35–58 (2001)

    Article  Google Scholar 

  24. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM (1999)

    Google Scholar 

  25. David, L., Daniel, B.: Clustering very large datasets using a low memory matrix factored representation. Comput. Intell. 114–135 (2009)

    Google Scholar 

  26. Dzemyda, G., Kurasova, O., Zilinskas, J.: Multidimensional Data Visualization: Methods and Applications, Springer Optimization and Its Applications. Springer (2013)

    Google Scholar 

  27. Hammer, B., Micheli, A., Sperduti, A., Strickert, M.: A general framework for unsupervised processing of structured data. Neurocomputing, 3–35 (2004)

    Article  Google Scholar 

  28. Voegtlin, T.: Recursive self-organizing maps. Neural Netw. 979–991 (2002)

    Article  Google Scholar 

  29. Lagus, K., Kaski, S., Kohonen, T.: Mining massive document collections by the WEBSOM method. Inf. Sci. 135–156 (2004)

    Article  Google Scholar 

  30. Stefanoviˇc, P., Kurasova, O.: Visual analysis of self-organizing maps. In: Nonlinear Analysis: Modelling and Control, pp. 488–504 (2011)

    Google Scholar 

  31. Kurasova, O., Marcinkeviˇcius, V., Medvedev, V., Rapeˇcka, A., Stefanoviˇc, P.: Strategies for big data clustering. In: IEEE 26th International Conference on Tools with Artificial Intelligence, pp. 740–747 (2014)

    Google Scholar 

  32. Nandakumar, A.N., Yambem, N.: A survey on data mining algorithms on apache hadoop platform. IJETAE, 563–565 (2014)

    Google Scholar 

  33. Veeranjaneyulu, N., NirupamaBhat, M., Raghunadh, A.: Approaches for managing and analyzing unstructured data. IJCSE, 19–24 (2014)

    Google Scholar 

  34. Jaatun, M.G., Zhao, G., Rong, C. (eds.): Parallel K-Means clustering based on MapReduce. In: CloudCom 2009, LNCS 5931, pp. 674–679 (2009)

    Google Scholar 

  35. Lin, J., Dyer, C.: Data-Intensive Text Processing with MapReduce (2010)

    Article  Google Scholar 

  36. Wang, F.L., et al., (eds.): Parallel K-Means clustering of remote sensing images based on MapReduce. In: WISM 2010, LNCS 6318, pp. 162–170 (2010)

    Google Scholar 

  37. Sun, Z.: Study on Parallel SVM Based on MapReduce. In: Conference on WorldComp (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. S. M. Lakshmi Patibandla .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Patibandla, R.S.M.L., Veeranjaneyulu, N. (2018). Survey on Clustering Algorithms for Unstructured Data. In: Bhateja, V., Coello Coello, C., Satapathy, S., Pattnaik, P. (eds) Intelligent Engineering Informatics. Advances in Intelligent Systems and Computing, vol 695. Springer, Singapore. https://doi.org/10.1007/978-981-10-7566-7_41

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7566-7_41

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7565-0

  • Online ISBN: 978-981-10-7566-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics