Building an Efficient Hadoop Workflow Engine Using BPEL

  • Jie Liu
  • Qiyuan Li
  • Feng Zhu
  • Jun Wei
  • Dan Ye
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8295)


Big data processing and analysis techniques can guide enterprises to make correct decisions, and will play an important role in the enterprise business process. The Hadoop platform has become the basis of big data processing and analysis. To satisfy the needs of enterprises to develop data-intensive workflow based on Hadoop and integrate them into existing business processes, we build a Hadoop workflow engine named Pony based on BPEL model. The mapping method from Hadoop Workflow to BPEL process in three levels of the semantic model, deployment model, and execution model is presented. Pony uses a matured and stable BPEL engine to orchestrate Hadoop services. Pony implements a Hadoop job scheduler to collaborate with a BPEL engine to online schedule multiple workflows at runtime. This paper describes the design and implementation of Pony, and the experiment results demonstrate Pony can provide improved performance.


MapReduce Hadoop workflow BPEL Data intensive computing Service oriented architecture 


  1. 1.
    Hadoop: Open source implementation of MapReduce,
  2. 2.
    Jeffrey, D., Sanjay, G.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceedings of the OSDI 2004, pp. 1–10 (2004)Google Scholar
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
    Zhang, C., De Sterck, H.: CloudWF: A computational workflow system for clouds based on Hadoop. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) CloudCom 2009. LNCS, vol. 5931, pp. 393–404. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  8. 8.
    Olston, C., Chiou, G., Chitnis, L., et al.: Nova: continuous Pig/Hadoop workflows. In: Proceedings of SIGMOD 2011, pp. 1081–1090 (2011)Google Scholar
  9. 9.
    OASIS. Web Services Business Process Execution Language Version 2.0,
  10. 10.
    OMG. Business Process Modeling Notation (BPMN),
  11. 11.
    El Akkaoui, Z., Zimanyi, E.: Defining ETL workflows using BPMN and BPEL. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP (DOLAP 2009), pp. 41–48 (2009)Google Scholar
  12. 12.
    Emmerich, W., Butchart, B., Chen, L., et al.: Grid Service Orchestration Using the Business Process Execution Language (BPEL). Journal of Grid Computing 3(3-4), 283–304 (2005)CrossRefGoogle Scholar
  13. 13.
    Behnen, M., Jin, Q., Saillet, Y., Srinivasan, S.: Supporting ETL Processing in BPEL-Based Processes. Publication number: US 2008/0115135 A1 (Filing date: November 13, 2006)Google Scholar
  14. 14.
  15. 15.
    Yu, J., Buyya, R.: Workflow Scheduling Algorithms for Grid Computing, Technical Report, GRIDS-TR-2007-10Google Scholar
  16. 16.
    Hsu, C., Huang, K., Wang, F.: Online scheduling of workflow applications in grid environments. Future Generation Computer Systems 27(6), 860–870 (2011)CrossRefGoogle Scholar
  17. 17.
    Zhifeng, Y., Weisong, S.: A Planner-Guided Scheduling Strategy for Multiple Workflow Applications. In: Proceedings of the ICPP - Workshops, pp. 1–8 (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Jie Liu
    • 1
  • Qiyuan Li
    • 1
  • Feng Zhu
    • 1
  • Jun Wei
    • 1
  • Dan Ye
    • 1
  1. 1.Technology Center of Software Engineering, Institute of SoftwareChinese Academy of SciencesBeijingChina

Personalised recommendations