Abstract
Google Cloud Dataflow provides a serverless, parallel, and distributed infrastructure for running jobs for batch and stream data processing. One of the core strengths of Dataflow is its ability to almost seamlessly handle the switch from processing of batch historical data to streaming datasets while elegantly taking into consideration the perks of streaming processing such as windowing. Dataflow is a major component of the data/ML pipeline on GCP. Typically, Dataflow is used to transform humongous datasets from a variety of sources such as Cloud Pub/Sub or Apache Kafka to a sink such as BigQuery or Google Cloud Storage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 Ekaba Bisong
About this chapter
Cite this chapter
Bisong, E. (2019). Google Cloud Dataflow. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4470-8_40
Download citation
DOI: https://doi.org/10.1007/978-1-4842-4470-8_40
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-4469-2
Online ISBN: 978-1-4842-4470-8
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)