Google Cloud Dataflow

Bisong, Ekaba

doi:10.1007/978-1-4842-4470-8_40

Ekaba Bisong²

7772 Accesses
1 Citations
6 Altmetric

Abstract

Google Cloud Dataflow provides a serverless, parallel, and distributed infrastructure for running jobs for batch and stream data processing. One of the core strengths of Dataflow is its ability to almost seamlessly handle the switch from processing of batch historical data to streaming datasets while elegantly taking into consideration the perks of streaming processing such as windowing. Dataflow is a major component of the data/ML pipeline on GCP. Typically, Dataflow is used to transform humongous datasets from a variety of sources such as Cloud Pub/Sub or Apache Kafka to a sink such as BigQuery or Google Cloud Storage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

OTTAWA, ON, Canada
Ekaba Bisong

Authors

Ekaba Bisong
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bisong, E. (2019). Google Cloud Dataflow. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4470-8_40

Download citation

DOI: https://doi.org/10.1007/978-1-4842-4470-8_40
Published: 28 September 2019
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-4469-2
Online ISBN: 978-1-4842-4470-8
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics