Scheduling and Workflow

  • Michael Frampton


When you’re working with big data in a distributed, parallel processing environment like Hadoop, job scheduling and workflow management are vital for efficient operation. Schedulers enable you to share resources at a job level within Hadoop; in the first half of this chapter, I use practical examples to guide you in installing, configuring, and using the Fair and Capacity schedulers for Hadoop V1 and V2. Additionally, at a higher level, workflow tools enable you to manage the relationships between jobs. For instance, a workflow might include jobs that source, clean, process, and output a data source. Each job runs in sequence, with the output from one forming the input for the next. So, in the second half of this chapter, I demonstrate how workflow tools like Oozie offer the ability to manage these relationships.


Fuel Consumption Configuration File Control Node Fair Scheduler Capacity Scheduler 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

© Michael Frampton 2015

Authors and Affiliations

  • Michael Frampton
    • 1
  1. 1.ParaparaumuNew Zealand

Personalised recommendations