Apache Flume

  • Deepak Vohra


Apache Flume is a framework based on streaming data flows for collecting, aggregating, and transferring large quantities of data. Flume is an efficient and reliable distributed service. A unit of data flow in Flume is called an event. The main components in Flume architecture are Flume source, Flume channel, and Flume sink, all of which are hosted by a Flume agent. A Flume source consumes events from an external source such as a log file or a web server. A Flume source stores the events it receives in a passive data store called a Flume channel. Examples of Flume channel types are a JDBC channel, a file channel, and a memory channel. The Flume sink component removes the events from the Flume channel and puts them in an external storage such as HDFS. A Flume sink can also forward events to another Flume source to be processed by another Flume agent. The Flume architecture for a single-hop data flow is shown in Figure 6-1.


Configuration File Streaming Data File Channel External Storage Memory Channel 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

© Deepak Vohra 2016

Authors and Affiliations

  • Deepak Vohra
    • 1
  1. 1.Apt 105White RockCanada

Personalised recommendations