Machine Learning at Scale

Nabi, Zubair

doi:10.1007/978-1-4842-1479-4_9

Zubair Nabi²

2604 Accesses

Abstract

Data by itself is a static, lifeless entity. You need analytics to breathe life into it and make it talk or even sing. The most sophisticated and popular class of such analytics revolves around nowcasting, forecasting, and recommendations, more generally known as machine learning and data mining. Machine-learning algorithms learn patterns in data and can then be used to make predictions, whereas data mining helps extract structure from unstructured data. Machine learning at scale is the key to practical predictions and recommendations, which are essential to drive the needs of consumers: commercial, academic, or scientific. This chapters uses MLlib to enable such applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 29.99; Price excludes VAT (USA)

Softcover Book: USD 37.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Plamen Nedeltchev, “The Internet of Everything Is the New Economy,” Cisco, September 29, 2015, www.cisco.com/c/en/us/solutions/collateral/enterprise/cisco-on-cisco/Cisco_IT_Trends_IoE_Is_the_New_Economy.html .
2.
Heather Clancy, “How GE Generates $1 Billion from Data,” Fortune, October 10, 2014, http://fortune.com/2014/10/10/ge-data-robotics-sensors/ .
3.
PAMAP2 Physical Activity Monitoring Data Set, UC Irvine Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/PAMAP2+Physical+Activity+Monitoring.
4.
This sensor was not calibrated properly while taking measurements, so the use of the 16 g attribute is recommended for any analytics.
5.
Make sure you add libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.4.0" to your build specification file.
6.
Jeremy Freeman, “Introducing Streaming k-means in Spark 1.2,” Databricks, January 28, 2015, https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html .
7.
“Getting the Current Filename with Spark and HDFS,” The Modern Life, September 28, 2014, http://themodernlife.github.io/scala/spark/hadoop/hdfs/2014/09/28/spark-input-filename/ .
8.
https://issues.apache.org/jira/browse/SPARK-6407 .

Author information

Authors and Affiliations

Lahore, Pakistan
Zubair Nabi

Authors

Zubair Nabi
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nabi, Z. (2016). Machine Learning at Scale. In: Pro Spark Streaming. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-1479-4_9

Download citation

DOI: https://doi.org/10.1007/978-1-4842-1479-4_9
Published: 14 June 2016
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-1480-0
Online ISBN: 978-1-4842-1479-4
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics