Abstract
Data by itself is a static, lifeless entity. You need analytics to breathe life into it and make it talk or even sing. The most sophisticated and popular class of such analytics revolves around nowcasting, forecasting, and recommendations, more generally known as machine learning and data mining. Machine-learning algorithms learn patterns in data and can then be used to make predictions, whereas data mining helps extract structure from unstructured data. Machine learning at scale is the key to practical predictions and recommendations, which are essential to drive the needs of consumers: commercial, academic, or scientific. This chapters uses MLlib to enable such applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Plamen Nedeltchev, “The Internet of Everything Is the New Economy,” Cisco, September 29, 2015, www.cisco.com/c/en/us/solutions/collateral/enterprise/cisco-on-cisco/Cisco_IT_Trends_IoE_Is_the_New_Economy.html .
- 2.
Heather Clancy, “How GE Generates $1 Billion from Data,” Fortune, October 10, 2014, http://fortune.com/2014/10/10/ge-data-robotics-sensors/ .
- 3.
PAMAP2 Physical Activity Monitoring Data Set, UC Irvine Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/PAMAP2+Physical+Activity+Monitoring.
- 4.
This sensor was not calibrated properly while taking measurements, so the use of the 16 g attribute is recommended for any analytics.
- 5.
Make sure you add libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.4.0" to your build specification file.
- 6.
Jeremy Freeman, “Introducing Streaming k-means in Spark 1.2,” Databricks, January 28, 2015, https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html .
- 7.
“Getting the Current Filename with Spark and HDFS,” The Modern Life, September 28, 2014, http://themodernlife.github.io/scala/spark/hadoop/hdfs/2014/09/28/spark-input-filename/ .
- 8.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Zubair Nabi
About this chapter
Cite this chapter
Nabi, Z. (2016). Machine Learning at Scale. In: Pro Spark Streaming. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-1479-4_9
Download citation
DOI: https://doi.org/10.1007/978-1-4842-1479-4_9
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-1480-0
Online ISBN: 978-1-4842-1479-4
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)