© 2015

Guide to High Performance Distributed Computing

Case Studies with Hadoop, Scalding and Spark

  • Provides a guide to the distributed computing technologies of Hadoop and Spark, from the perspective of industry practitioners

  • Supports the theory with case studies taken from a range of disciplines, including data mining, machine learning, graph processing and image processing

  • Supplies working source code to aid understanding through step-by-step implementation


Part of the Computer Communications and Networks book series (CCN)

Table of contents

  1. Front Matter
    Pages i-xvii
  2. Programming Fundamentals of High Performance Distributed Computing

    1. Front Matter
      Pages 1-1
    2. K. G. Srinivasa, Anil Kumar Muppalla
      Pages 3-31
    3. K. G. Srinivasa, Anil Kumar Muppalla
      Pages 33-72
    4. K. G. Srinivasa, Anil Kumar Muppalla
      Pages 73-99
    5. K.G. Srinivasa, Anil Kumar Muppalla
      Pages 101-154
  3. Case Studies Using Hadoop, Scalding and Spark

    1. Front Matter
      Pages 155-155
    2. K G Srinivasa, Anil Kumar Muppalla
      Pages 157-183
    3. K G Srinivasa, Anil Kumar Muppalla
      Pages 185-217
    4. K G Srinivasa, Anil Kumar Muppalla
      Pages 219-259
    5. K. G. Srinivasa, Anil Kumar Muppalla
      Pages 261-301
  4. Back Matter
    Pages 303-304

About this book


This timely text/reference describes the development and implementation of large-scale distributed processing systems using open source tools and technologies such as Hadoop, Scalding and Spark.

Comprehensive in scope, the book presents state-of-the-art material on building high performance distributed computing systems, providing practical guidance and best practices as well as describing theoretical software frameworks.

Topics and features:

  • Describes the fundamentals of building scalable software systems for large-scale data processing in the new paradigm of high performance distributed computing
  • Presents an overview of the Hadoop ecosystem, followed by step-by-step instruction on its installation, programming and execution
  • Reviews the basics of Spark, including resilient distributed datasets, and examines Hadoop streaming and working with Scalding
  • Provides detailed case studies on approaches to clustering, data classification and regression analysis
  • Explains the process of creating a working recommender system using Scalding and Spark
  • Supplies a complete list of supplementary source code and datasets at an associated website

Fulfilling the need for both introductory material for undergraduate students of computer science and detailed discussions for software engineering professionals, this book will aid a broad audience to understand the esoteric aspects of practical high performance computing through its use of solved problems, research case studies and working source code.

K.G. Srinivasa is Professor and Head of the Department of Computer Science and Engineering at M.S. Ramaiah Institute of Technology (MSRIT), Bangalore, India. His other publications include the Springer title Soft Computing for Data Mining Applications. Anil Kumar Muppalla is also a researcher at MSRIT.


Algorithms Case Studies Hadoop High Performance Computing Spark

Authors and affiliations

  1. 1.M.S. Ramaiah Institute of TechnologyBangaloreIndia
  2. 2.M.S. Ramaiah Institute of TechnologyBangaloreIndia

Bibliographic information

Industry Sectors
IT & Software
Consumer Packaged Goods
Materials & Steel
Finance, Business & Banking
Energy, Utilities & Environment
Oil, Gas & Geosciences