Abstract
Big data processing and analysis techniques can guide enterprises to make correct decisions, and will play an important role in the enterprise business process. The Hadoop platform has become the basis of big data processing and analysis. To satisfy the needs of enterprises to develop data-intensive workflow based on Hadoop and integrate them into existing business processes, we build a Hadoop workflow engine named Pony based on BPEL model. The mapping method from Hadoop Workflow to BPEL process in three levels of the semantic model, deployment model, and execution model is presented. Pony uses a matured and stable BPEL engine to orchestrate Hadoop services. Pony implements a Hadoop job scheduler to collaborate with a BPEL engine to online schedule multiple workflows at runtime. This paper describes the design and implementation of Pony, and the experiment results demonstrate Pony can provide improved performance.
Chapter PDF
Similar content being viewed by others
References
Hadoop: Open source implementation of MapReduce, http://lucene.apache.org/hadoop/
Jeffrey, D., Sanjay, G.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceedings of the OSDI 2004, pp. 1–10 (2004)
Apache Oozie, http://yahoo.github.com/oozie/
Cascading, http://www.cascading.org/
Hamake, http://code.google.com/p/hamake/
LinkedIn Azkaban, http://sna-projects.com/azkaban/
Zhang, C., De Sterck, H.: CloudWF: A computational workflow system for clouds based on Hadoop. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) CloudCom 2009. LNCS, vol. 5931, pp. 393–404. Springer, Heidelberg (2009)
Olston, C., Chiou, G., Chitnis, L., et al.: Nova: continuous Pig/Hadoop workflows. In: Proceedings of SIGMOD 2011, pp. 1081–1090 (2011)
OASIS. Web Services Business Process Execution Language Version 2.0, http://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.pdf
OMG. Business Process Modeling Notation (BPMN), http://www.omg.org/docs/formal/09-01-03.pdf
El Akkaoui, Z., Zimanyi, E.: Defining ETL workflows using BPMN and BPEL. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP (DOLAP 2009), pp. 41–48 (2009)
Emmerich, W., Butchart, B., Chen, L., et al.: Grid Service Orchestration Using the Business Process Execution Language (BPEL). Journal of Grid Computing 3(3-4), 283–304 (2005)
Behnen, M., Jin, Q., Saillet, Y., Srinivasan, S.: Supporting ETL Processing in BPEL-Based Processes. Publication number: US 2008/0115135 A1 (Filing date: November 13, 2006)
Capacity Scheduler for Hadoop, http://Hadoop.apache.org/common/docs/current/Capacity_scheduler.html
Yu, J., Buyya, R.: Workflow Scheduling Algorithms for Grid Computing, Technical Report, GRIDS-TR-2007-10
Hsu, C., Huang, K., Wang, F.: Online scheduling of workflow applications in grid environments. Future Generation Computer Systems 27(6), 860–870 (2011)
Zhifeng, Y., Weisong, S.: A Planner-Guided Scheduling Strategy for Multiple Workflow Applications. In: Proceedings of the ICPP - Workshops, pp. 1–8 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, J., Li, Q., Zhu, F., Wei, J., Ye, D. (2013). Building an Efficient Hadoop Workflow Engine Using BPEL. In: Sheng, Q.Z., Kjeldskov, J. (eds) Current Trends in Web Engineering. ICWE 2013. Lecture Notes in Computer Science, vol 8295. Springer, Cham. https://doi.org/10.1007/978-3-319-04244-2_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-04244-2_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04243-5
Online ISBN: 978-3-319-04244-2
eBook Packages: Computer ScienceComputer Science (R0)