Advertisement

Algorithms for Data Science

  • Brian Steele
  • John Chandler
  • Swarna Reddy

Table of contents

  1. Front Matter
    Pages i-xxiii
  2. Brian Steele, John Chandler, Swarna Reddy
    Pages 1-16
  3. Data Reduction

    1. Front Matter
      Pages 17-17
    2. Brian Steele, John Chandler, Swarna Reddy
      Pages 19-50
    3. Brian Steele, John Chandler, Swarna Reddy
      Pages 51-104
    4. Brian Steele, John Chandler, Swarna Reddy
      Pages 105-129
  4. Extracting Information from Data

    1. Front Matter
      Pages 131-131
    2. Brian Steele, John Chandler, Swarna Reddy
      Pages 133-159
    3. Brian Steele, John Chandler, Swarna Reddy
      Pages 161-215
    4. Brian Steele, John Chandler, Swarna Reddy
      Pages 217-251
    5. Brian Steele, John Chandler, Swarna Reddy
      Pages 253-275
  5. Predictive Analytics

    1. Front Matter
      Pages 277-277
    2. Brian Steele, John Chandler, Swarna Reddy
      Pages 279-312
    3. Brian Steele, John Chandler, Swarna Reddy
      Pages 313-342
    4. Brian Steele, John Chandler, Swarna Reddy
      Pages 343-379
    5. Brian Steele, John Chandler, Swarna Reddy
      Pages 381-401
  6. Back Matter
    Pages 403-430

About this book

Introduction

This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses.

This book has three parts:
(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.
(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.
(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.
This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners.


Keywords

Data science Data analytics Big data Predictive analytics Hadoop Python Clustering Streaming data Healthcare analytics Forecasting k-nearest neighbors MapReduce Data visualization Real-time analytics R programming Classification

Authors and affiliations

  • Brian Steele
    • 1
  • John Chandler
    • 2
  • Swarna Reddy
    • 3
  1. 1.University of MontanaMissoulaUSA
  2. 2.School of Business AdministrationUniversity of MontanaMissoulaUSA
  3. 3.SoftMath Consultants, LLCMissoulaUSA

Bibliographic information

  • DOI https://doi.org/10.1007/978-3-319-45797-0
  • Copyright Information Springer International Publishing Switzerland 2016
  • Publisher Name Springer, Cham
  • eBook Packages Computer Science
  • Print ISBN 978-3-319-45795-6
  • Online ISBN 978-3-319-45797-0
  • Buy this book on publisher's site
Industry Sectors
Pharma
Automotive
Biotechnology
Finance, Business & Banking
Electronics
IT & Software
Telecommunications
Consumer Packaged Goods
Energy, Utilities & Environment
Aerospace
Engineering