Case Study I: Data Clustering using Scalding and Spark

  • K G SrinivasaEmail author
  • Anil Kumar Muppalla
Part of the Computer Communications and Networks book series (CCN)


Data mining is the process of discovering insightful, interesting, and novel patterns, as well as descriptive, understandable, and predictive models from large-scale data.


Data Mining Cluster Algorithm Cluster Center Data Mining Technique Cluster Centroid 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    StatStoft, Inc. Data Mining Techniques
  2. 2.
    Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman,Mining of Massive Datasets, 2010Google Scholar
  3. 3.
    Mohammed J. Zaki,Wagner Meira JR. Ullman,Data Mining and Analysis: Fundamental Concepts and Algorithms, 2014Google Scholar
  4. 4.
    John. McCullok, Step By Step K-Means
  5. 5.
    NonLinear Dimensionality Reduction dimensionality reduction
  6. 6.
    Principal Componet Analysis component analysis
  7. 7.
    Rice, Stephen V., Nagy, George, Nartker, Thomas A. Optical Character Recognition, Springer, 1999Google Scholar
  8. 8.
    Han, J. and Pei, J. 2000. Mining frequent patterns by pattern growth: Methodology and implications. SIGKDD Explorations Newsletter 2, 2, 1420.Google Scholar
  9. 9.
    Han, Jiawei, and Micheline Kamber. Data Mining, Southeast Asia Edition: Concepts and Techniques. Morgan kaufmann, 2006.Google Scholar
  10. 10.
    Fayyad, Usama M., et al. ”Advances in knowledge discovery and data mining.” (1996).Google Scholar
  11. 11.
    Berkhin, Pavel. ”A survey of clustering data mining techniques.” Grouping multidimensional data. Springer Berlin Heidelberg, 2006. 25-71.Google Scholar
  12. 12.
    Weiss, Sholom M. Predictive data mining: a practical guide. Morgan Kaufmann, 1998.Google Scholar
  13. 13.
    Witten, Ian H., and Eibe Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2005.Google Scholar
  14. 14.
    Park, Byung-Hoon, and Hillol Kargupta. ”Distributed data mining: Algorithms, systems, and applications.” (2002).Google Scholar
  15. 15.
    Elavarasi, S. Anitha, J. Akilandeswari, and B. Sathiyabhama. ”A survey on partition clustering algorithms.” International Journal of Enterprise Computing and Business Systems 1.1 (2011).Google Scholar
  16. 16.
    Johnson, Stephen C. ”Hierarchical clustering schemes.” Psychometrika 32.3 (1967): 241-254.Google Scholar
  17. 17.
    Rayner Alfred, 2008, A Data Summarisation Approach toKnowledge Discovery, Thesis, Univeristy of York.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.M.S. Ramaiah Institute of TechnologyBangaloreIndia

Personalised recommendations