Skip to main content

Machine Learning at Scale

  • Chapter
  • First Online:
Pro Spark Streaming
  • 2604 Accesses

Abstract

Data by itself is a static, lifeless entity. You need analytics to breathe life into it and make it talk or even sing. The most sophisticated and popular class of such analytics revolves around nowcasting, forecasting, and recommendations, more generally known as machine learning and data mining. Machine-learning algorithms learn patterns in data and can then be used to make predictions, whereas data mining helps extract structure from unstructured data. Machine learning at scale is the key to practical predictions and recommendations, which are essential to drive the needs of consumers: commercial, academic, or scientific. This chapters uses MLlib to enable such applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 29.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 37.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Plamen Nedeltchev, “The Internet of Everything Is the New Economy,” Cisco, September 29, 2015, www.cisco.com/c/en/us/solutions/collateral/enterprise/cisco-on-cisco/Cisco_IT_Trends_IoE_Is_the_New_Economy.html .

  2. 2.

    Heather Clancy, “How GE Generates $1 Billion from Data,” Fortune, October 10, 2014, http://fortune.com/2014/10/10/ge-data-robotics-sensors/ .

  3. 3.

    PAMAP2 Physical Activity Monitoring Data Set, UC Irvine Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/PAMAP2+Physical+Activity+Monitoring.

  4. 4.

    This sensor was not calibrated properly while taking measurements, so the use of the 16 g attribute is recommended for any analytics.

  5. 5.

    Make sure you add libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.4.0" to your build specification file.

  6. 6.

    Jeremy Freeman, “Introducing Streaming k-means in Spark 1.2,” Databricks, January 28, 2015, https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html .

  7. 7.

    “Getting the Current Filename with Spark and HDFS,” The Modern Life, September 28, 2014, http://themodernlife.github.io/scala/spark/hadoop/hdfs/2014/09/28/spark-input-filename/ .

  8. 8.

    https://issues.apache.org/jira/browse/SPARK-6407 .

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Zubair Nabi

About this chapter

Cite this chapter

Nabi, Z. (2016). Machine Learning at Scale. In: Pro Spark Streaming. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-1479-4_9

Download citation

Publish with us

Policies and ethics