Advertisement

A Distributed Approach Towards Density Based Clustering: D-TDCT

  • Hrishav Bakul Barua
Original Contribution

Abstract

Data clustering is not a new term in computing and specifically data mining. It has been around for years now. Many researches have been conducted in data mining using clustering techniques in recent times with various paradigms. Finding patterns from highly populated data sets is called clustering and it has been taken care by well framed algorithms. The main concern now is how to achieve speed up and scaleup in such clustering tasks using parallelism and distribution. In this paper, a technique of density based clustering has been examined in distributed environment. Shared-nothing distributed architecture is used for the setup. The dataset can be divided into number of subsets and fed into the nodes of the distributed setup connected via network. The experimental results have reported to have established the superiority of this technique with respect to the existing approaches.

Keywords

Clustering Density-based Distributed clustering Parallelism 

References

  1. 1.
    J. Han, M. Kamber, Data Mining: Concepts and Techniques (Morgan Kaufmann Publishers, New Delhi, 2004)zbMATHGoogle Scholar
  2. 2.
    M. Ester, H.P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in International Conference on Knowledge Discovery in Databases and Data Mining (KDD-96) (Portland, 1996), p. 226–231Google Scholar
  3. 3.
    M. Ankerst, M.M. Breunig, H.P. Kriegel, J. Sander, OPTICS: ordering points to identify the clustering structure, in ACM Sigmod Record, vol. 28(2) (ACM, 1999), p. 49–60CrossRefGoogle Scholar
  4. 4.
    J. Sander, M. Ester, H.P. Kriegel, X. Xu, Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min. Knowl. Disc. 2(2), 169–194 (1998)CrossRefGoogle Scholar
  5. 5.
    F. Cao, M. Estert, W. Qian, A. Zhou, Density-based clustering over an evolving data stream with noise, in Proceedings of the 2006 SIAM International Conference on Data Mining (Society for Industrial and Applied Mathematics, 2006), p. 328–339CrossRefGoogle Scholar
  6. 6.
    A.K. Jain, R.C. Dubes, Algorithms for Clustering Data (Prentice-Hall, Inc., Upper Saddle River, 1988)zbMATHGoogle Scholar
  7. 7.
    A.K. Jain, Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)CrossRefGoogle Scholar
  8. 8.
    P. Berkhin, A survey of clustering data mining techniques. Group. Multidimens. Data 25, 71 (2006)Google Scholar
  9. 9.
    A.K. Jain, M.N. Murty, P.J. Flynn, Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)CrossRefGoogle Scholar
  10. 10.
    D. Thain, T. Tannenbaum, M. Livny, Distributed computing in practice: the Condor experience. Concurr. Comput. Pract. Exp. 17(2–4), 323–356 (2005)CrossRefGoogle Scholar
  11. 11.
    T.L. Casavant, J.G. Kuhl, A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans. Softw. Eng. 14(2), 141–154 (1988)CrossRefGoogle Scholar
  12. 12.
    H. Attiya, J. Welch, Distributed Computing: Fundamentals, Simulations, and Advanced Topics, vol. 19 (Wiley, Hoboken, 2004)CrossRefGoogle Scholar
  13. 13.
    M. Baker, R. Buyya, D. Laforenza, Grids and grid technologies for wide-area distributed computing. Softw. Pract. Exp. 32(15), 1437–1466 (2002)CrossRefGoogle Scholar
  14. 14.
    H.B. Barua, D.K. Das, S. Sarmah, A density based clustering technique for large spatial data using polygon approach. TDCT IOSR J. Comput. Eng. (IOSRJCE) 3(6), 01–10 (2012). ISSN: 2278-0661 CrossRefGoogle Scholar
  15. 15.
    H.P. Kriegel, M. Pfeifle, Density-based clustering of uncertain data, in Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (ACM, 2005), p. 672–677Google Scholar
  16. 16.
    S. Kisilevich, F. Mansmann, D. Keim, P-DBSCAN: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos, in Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application (ACM, 2010), p. 38Google Scholar
  17. 17.
    L. Duan, L. Xu, F. Guo, J. Lee, B. Yan, A local-density based spatial clustering algorithm with noise. Inf. Syst. 32(7), 978–986 (2007)CrossRefGoogle Scholar
  18. 18.
    B. Borah, D.K. Bhattacharyya, R.K. Das, A parallel density-based data clustering technique on distributed memory multicomputers, in Proc. ADCOM (Ahmedabad, 2004)Google Scholar
  19. 19.
    Y. He, H. Tan, W. Luo, H. Mao, D. Ma, S. Feng, and J. Fan, Mr-DBSCAN: an efficient parallel density-based clustering algorithm using mapreduce, in 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS) (IEEE, 2011), p. 473–480Google Scholar
  20. 20.
    G.H. Karypis, V. Kumar, CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)CrossRefGoogle Scholar
  21. 21.
    A.A. Al Islam, C.S. Hyder, H. Kabir, M. Naznin, Stable sensor network (SSN): a dynamic clustering technique for maximizing stability in wireless sensor networks. Wirel. Sens. Netw. 2(07), 538 (2010)CrossRefGoogle Scholar
  22. 22.
    H.S. Kim, S. Gao, Y. Xia, G.B. Kim, H.Y. Bae, DGCL: an efficient density and grid based clustering algorithm for large spatial database, in Advances in Web-Age Information Management (WAIM’06) (2006), p. 362–371Google Scholar
  23. 23.
    D. Birant, A. Kut, ST-DBSCAN: an algorithm for clustering spatial–temporal data. Data Knowl. Eng. 60(1), 208–221 (2007)CrossRefGoogle Scholar
  24. 24.
    S. Roy, D.K. Bhattacharyya, An approach to find embedded clusters using density based techniques, in Proc. ICDCIT, LNCS 3816 (2005), p. 523–535Google Scholar
  25. 25.
    Y. Lv, T. Ma, M. Tang, J. Cao, Y. Tian, A. Al-Dhelaan, M. Al-Rodhaan, An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171, 9–22 (2016)CrossRefGoogle Scholar
  26. 26.
    T. Mastelic, A. Oleksiak, H. Claussen, I. Brandic, J.M. Pierson, A.V. Vasilakos, Cloud computing: survey on energy efficiency. ACM Comput. Surv. (CSUR) 47(2), 33 (2015)Google Scholar
  27. 27.
    S. Venugopal, R. Buyya, K. Ramamohanarao, A taxonomy of data grids for distributed data sharing, management, and processing. ACM Comput. Surv. (CSUR) 38(1), 3 (2006)CrossRefGoogle Scholar
  28. 28.
    D. Han, A. Agrawal, W.K. Liao, A. Choudhary, A novel scalable DBSCAN algorithm with spark, in Parallel and Distributed Processing Symposium Workshops, 2016 IEEE International (IEEE, 2016), p. 1393–1402Google Scholar
  29. 29.
    A. Lulli, M. Dell’Amico, P. Michiardi, L. Ricci, NG-DBSCAN: scalable density-based clustering for arbitrary data. Proc. VLDB Endow. 10(3), 157–168 (2016)CrossRefGoogle Scholar
  30. 30.
    G. Luo, X. Luo, T.F. Gooch, L. Tian, K. Qin, A parallel DBSCAN algorithm based on spark, in 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom)(BDCloud-SocialCom-SustainCom) (IEEE, 2016), p. 548–553Google Scholar
  31. 31.
    S. Sarmah, R. Das, D.K. Bhattacharyya, “A distributed algorithm for intrinsic cluster detection over large spatial data” a grid-density based clustering technique (GDCT). World Acad. Sci. Eng. Technol. 45, 856–866 (2008)Google Scholar

Copyright information

© The Institution of Engineers (India) 2018

Authors and Affiliations

  1. 1.Tata Research & Innovation Labs (Research &Development)Tata Consultancy ServicesKolkataIndia

Personalised recommendations