Skip to main content
  • Textbook
  • © 2016

Algorithms for Data Science

  • Unites theory, algorithm design, and practical data analysis for simplicity and clarity of content
  • Contains more than twenty detailed and carefully crafted Python tutorials
  • Each chapter includes exercises of varying levels of difficulty
  • Uses publicly available data sets throughout the book
  • Includes supplementary material: sn.pub/extras
  • Request lecturer material: sn.pub/lecturer-material

Buy it now

Buying options

eBook USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (12 chapters)

  1. Front Matter

    Pages i-xxiii
  2. Introduction

    • Brian Steele, John Chandler, Swarna Reddy
    Pages 1-16
  3. Data Reduction

    1. Front Matter

      Pages 17-17
    2. Data Mapping and Data Dictionaries

      • Brian Steele, John Chandler, Swarna Reddy
      Pages 19-50
    3. Scalable Algorithms and Associative Statistics

      • Brian Steele, John Chandler, Swarna Reddy
      Pages 51-104
    4. Hadoop and MapReduce

      • Brian Steele, John Chandler, Swarna Reddy
      Pages 105-129
  4. Extracting Information from Data

    1. Front Matter

      Pages 131-131
    2. Data Visualization

      • Brian Steele, John Chandler, Swarna Reddy
      Pages 133-159
    3. Linear Regression Methods

      • Brian Steele, John Chandler, Swarna Reddy
      Pages 161-215
    4. Healthcare Analytics

      • Brian Steele, John Chandler, Swarna Reddy
      Pages 217-251
    5. Cluster Analysis

      • Brian Steele, John Chandler, Swarna Reddy
      Pages 253-275
  5. Predictive Analytics

    1. Front Matter

      Pages 277-277
    2. k-Nearest Neighbor Prediction Functions

      • Brian Steele, John Chandler, Swarna Reddy
      Pages 279-312
    3. The Multinomial Naïve Bayes Prediction Function

      • Brian Steele, John Chandler, Swarna Reddy
      Pages 313-342
    4. Forecasting

      • Brian Steele, John Chandler, Swarna Reddy
      Pages 343-379
    5. Real-time Analytics

      • Brian Steele, John Chandler, Swarna Reddy
      Pages 381-401
  6. Back Matter

    Pages 403-430

About this book

This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses.


This book has three parts:
(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, themathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.
(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.
(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.
This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners.




Reviews

“This 430-page book contains an excellent collection of information on the subject of practical algorithms used in data science. The discussion of each algorithm starts with some basic concepts, followed by a tutorial with real datasets and detailed code examples in Python or R. Each chapter has a set of exercise problems so readers can practice the concepts learned in the chapter. … a good reference for practitioners, or a good textbook for graduate or upper-class undergraduate students.” (Xiannong Meng, Computing Reviews, September, 2017)


“This textbook on practical data analytics unites fundamental principles, algorithms, and data. … this book is devoted to upper-division undergraduate and graduate students in mathematics, statistics, and computer science. It is intended for a one- or two-semester course in data analytics and reflects the authors’ research experience in data science concepts and the teaching skills in various areas. … The text is eminently suitable for self-study and an exceptional resource for practitioners.” (Krzysztof J. Szajowski, zbMATH 1367.62005, 2017)  

Authors and Affiliations

  • University of Montana, Missoula, USA

    Brian Steele

  • School of Business Administration, University of Montana, Missoula, USA

    John Chandler

  • SoftMath Consultants, LLC, Missoula, USA

    Swarna Reddy

About the authors

Brian Steele is a full professor of Mathematics at the University of Montana and a Senior Data Scientist for SoftMath Consultants, LLC. Dr. Steele has published on the EM algorithm, exact bagging, the bootstrap, and numerous statistical applications. He teaches data analytics and statistics and consults on a wide variety of subjects related to data science and statistics.


John Chandler has worked at the forefront of marketing and data analysis since 1999. He has worked with Fortune 100 advertisers and scores of agencies, measuring the effectiveness of advertising and improving performance. Dr. Chandler joined the faculty at the University of Montana School of Business Administration as a Clinical Professor of Marketing in 2015 and teaches classes in advanced marketing analytics and data science. He is one of the founders and Chief Data Scientist for Ars Quanta, a Seattle-based data science consultancy.


Dr. Swarna Reddy is the founder, CEO, and a Senior Data Scientist for SoftMath Consultants, LLC and serves as a faculty affiliate with the Department of Mathematical Sciences at the University of Montana. Her area of expertise is computational mathematics and operations research. She is a published researcher and has developed computational solutions across a wide variety of areas spanning bioinformatics, cybersecurity, and business analytics.




Bibliographic Information

Buy it now

Buying options

eBook USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access