Abstract
The advent of next-generation computation-intensive applications in various science fields is pushing computing demands to go far beyond the capability of traditional computing solutions based on standalone PCs. The availability of today’s largest clusters, grids, and supercomputers expedites the development of robust problem-solving environments that marshal those high-performance computing and networking resources and presents a great opportunity to manage and execute large-scale computing workflows for collaborative scientific research. Supporting such scientific workflows and optimizing their end-to-end performance in wide-area networks is crucial to ensuring the success of large-scale distributed scientific applications. We consider a special type of pipeline workflows comprised of a set of linearly arranged modules, and formulate and categorize pipeline mapping problems into six classes with two optimization objectives, i.e., minimum end-to-end delay and maximum frame rate, and three network constraints, i.e., no, contiguous, and arbitrary node reuse. We design a dynamic programming-based optimal solution to the problem of minimum end-to-end delay with arbitrary node reuse and prove the NP-completeness of the rest five problems, for each of which, a heuristic algorithm based on a similar optimization procedure is proposed. These heuristics are implemented and tested on a large set of simulated networks of various scales and their performance superiority is illustrated by extensive simulation results in comparison with existing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwalla, B., Ahmed, N., Hilley, D., Ramachandran, U.: Streamline: A scheduling heuristic for streaming application on the grid. In: Proceedings of the 13th Multimedia Computing and Networking Conference, San Jose (2006)
Ahmed, I., Kwok, Y.: On exploiting task duplication in parallel program scheduling. IEEE Trans. Parallel Distrib. Syst. 9, 872–892 (1998)
Annie, S.W., Yu, H., Jin, S., Lin, K.C.: An incremental genetic algorithm approach to multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 15, 824–834 (2004)
Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., Protasi, M.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, Berlin (1999)
Bajaj, R., Agrawal, D.P.: Improving scheduling of tasks in a heterogeneous environment. IEEE Trans. Parallel Distrib. Syst. 15, 107–118 (2004)
Bashir, A.F., Susarla, V., Vairavan, K.: A statistical study of the performance of a task scheduling algorithm. IEEE Trans. Comput. 32(12), 774–777 (1975)
Benoit, A., Hakem, M., Robert, Y.: Optimizing the latency of streaming applications under throughput and reliability constraints. In: Proceedings of the 2009 International Conference on Parallel Processing, Vienna, pp. 325–332, (2009)
Benoit, A., Robert, Y.: Mapping pipeline skeletons onto heterogeneous platforms. J. Parallel Distrib. Comput. 68(6), 790–808 (2008)
Boeres, C., Filho, J.V., Rebello, V.E.F.: A cluster-based strategy for scheduling task on heterogeneous processors. In: Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing, Foz Do Iguacu, pp. 214–221 (2004)
Bozdag, D., Catalyurek, U., Ozguner, F.: A task duplication based bottom-up scheduling algorithm for heterogeneous environments. In: Proceedings of the 20th International Parallel and Distributed Processing Symposium, Rhodes Island, pp. 12 (2006). doi:10.1109/IPDPS.2006.1639389
Braun, T.D., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B., Hensgen, D., Freund, R.F.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61(6), 810–837 (2001)
Buyya, R., Abramson, D., Giddy, J.: Nimrod/G: An architecture for a resource management and scheduling system in a global computational grid. In: Proceedings of the 4th International Conference/Exhibition on the High Performance Computing in the Asia-Pacific Region, vol. 1, Beijing, pp. 283–289 (2000)
Cao, J., Jarvis, S.A., Saini, S., Nudd, G.R.: GridFlow: Workflow management for grid computing. In: Proceedings of the 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, Tokyo, pp. 198–205 (2003)
Chatterjee, S., Strosnider, J.: Distributed pipeline scheduling: End-to-end analysis of heterogeneous, multi-resource real-time systems. In: Proceedings of the 15th International Conference on Distributed Computing Systems, Washington, DC, pp. 204–211 (1995)
Chaudhary, V., Aggarwal, J.K.: A generalized scheme for mapping parallel algorithms. IEEE Trans. Parallel Distrib. Syst. 4(3), 328–346 (1993)
Chen, L., Agrawal, G.: Resource allocation in a middleware for streaming data. In Proceedings of the 2nd Workshop on Middleware for Grid Computing (2004). doi:10.1145/1028493.1028494
Choi, S.Y., Turner, J.: Configuring sessions in programmable networks with capacity constraints. In: Proceedings of IEEE International Conference on Communications, Anchorage, pp. 823–829 (2003)
Choi, S.Y., Turner, J., Wolf, T.: Configuring sessions in programmable networks. In: Proceedings of IEEE INFOCOM, Anchorage, pp. 60–66 (2001)
Climate and Carbon Research Institute (CCR): Retrieved from http://www.ccs.ornl.gov/CCR (2010)
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: An improved algorithm for matching large graphs. In: Proceedings of the 3rd IAPR-TC-15 International Workshop on Graph-Based Representations, Venice (2001)
Deelman, E., Callaghan, S., Field, E., Francoeur, H., Graves, R., Gupta, N., Gupta, V., Jordan, T.H., Kesselman, C., Maechling, P., Mehringer, J., Mehta, G., Okaya, D., Vahi, K., Zhao, L.: Managing large-scale workflow execution from resource provisioning to provenance tracking: The cybershake example. In: Proceedings of the e-Science Conference, Amsterdam (2006). doi:10.1109/E-SCIENCE.2006.99
Deelman, E., Singh, G., Su, M., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: A framework for mapping complex scientific workflows onto distributed systems. J. Sci. Program. 13, 219–237 (2005)
Dogan, A., Özgüner, F.: Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 308–323 (2002)
Earth System Grid (ESG): Retrieved from http://www.earthsystemgrid.org (2010)
Fortune, S., Hopcroft, J., Wyllie, J.: The directed subgraph homeomorphism problem. Theor. Comput. Sci. 10, 111–121 (1980)
Foster, I.T.: Globus toolkit version 4: Software for service-oriented systems. J. Comput. Sci. Technol. 21(4), 513–520 (2006)
Foster, I.T., Kesselman, C., Tuecke, S.: The anatomy of the grid – Enabling scalable virtual organizations. Int. J. Supercomput. Appl. 15(3), 200–222 (2001)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, New York (1979)
Gerasoulis, A., Yang, T.: A comparison of clustering heuristics for scheduling DAGs on multiprocessors. J. Parellel Distrib. Comput. 16(4), 276–291 (1992)
González, D., Almeida, F., Moreno, L., Rodríguez, C.: Towards the automatic optimal mapping of pipeline algorithms. J. Parallel Comput. 29(2), 241–254 (2003)
Gu, Y., Wu, Q.: Optimizing distributed computing workflows in heterogeneous network environments. In: Proceedings of the 11th International Conference on Distributed Computing and Networking, San Jose (2010). doi: 10.1007/978-3-642-11322-2_17
Gu, Y., Wu, Q., Benoit, A., Robert, Y.: Optimizing end-to-end performance of distributed applications with linear computing pipelines. In: Proceedings of the 15th International Conference on Parallel and Distributed Systems, Shenzhen (2009)
Ilavarasan, E., Thambidurai, P.: Low complexity performance effective task scheduling algorithm for heterogeneous computing environments. J. Comput. Sci. 3(2), 94–103 (2007)
Johnston, W.E.: Computational and data grids in large-scale science and engineering. J. Future Generation Comput. Syst. 18(8), 1085–1100 (2002)
Kwok, Y.K., Ahmad, I.: Dynamic critical-path scheduling: An effective technique for allocating task graph to multiprocessors. IEEE Trans. Parallel Distrib. Syst. 7(5), 506–521 (1996)
Litzkow, M., Livny, M., Mutka, M.: Condor – A hunter of idle workstations. In: Proceedings of the 8th International Conference on Distributed Computing Systems, San Jose, pp. 104–111 (1988)
Ma, T., Buyya, R.: Critical-path and priority based algorithms for scheduling workflows with parameter sweep tasks on global grids. In: Proceedings of the 17th International Symposium on Computer Architecture on HPC, Rio de Janeiro, pp. 251–258 (2005)
McCreary, C., Khan, A.A., Thompson, J.J., McArdle, M.E.: A comparison of heuristics for scheduling DAGs on multiprocessors. In: Proceedings of the 8th International Symposium on Parallel Processing, Cancun, pp. 446–451 (1994)
McDermott, W.J., Maluf, D.A., Gawdiak, Y., Tran, P.B.: Airport simulations using distributed computational resources. DOI: 10.4271/2001-01-2650 (2001)
Messmer, B.T.: Efficient graph matching algorithms for preprocessed model graphs. Ph.D. thesis, Institute of Computer Science and Applied Mathematics, University of Bern, Bern, Swtzerland (1996)
Mezzacappa, A.: Scidac Scientific discovery through advanced computing. J. Phys. Conf. Ser. 16, 536–540 (2005)
NSF Grand Challenges in eScience Workshop, 2001 (NSF): Retrieved from http://www2.evl.uic.edu/NSF/index.html (2010)
Open Science Grid (OSG): Retrieved from http://www.opensciencegrid.org (2010)
Rahman, M., Venugopal, S., Buyya, R.: A dynamic critical path algorithm for scheduling scientific workflow applications on global grids. In: Proceedings of the 3rd IEEE International Conference on e-Science and Grid Computing, Bangalore, pp 35–42 (2007)
Ranaweera, A., Agrawal, D.P.: A task duplication based algorithm for heterogeneous systems. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium, Los Alamitos, pp. 445–450 (2000)
Relativistic Heavy Ion Collider (LHC): Retrieved from http://www.bnl.gov/rhic (2010)
Sekhar, A., Manoj, B.S., Murthy, C.S.R.: A state-space search approach for optimizing reliability and cost of execution in distributed sensor networks. In: Proceedings of International Workshop on Distributed Computing, Kharagpur, pp. 63–74 (2005)
Shirazi, B., Wang, M., Pathak, G.: Analysis and evaluation of heuristic methods for static scheduling. J. Parallel Distrib. Comput. 10, 222–232 (1990)
Shroff, P., Watson, D.W., Flann, N.S., Freund, R.F.: Genetic simulated annealing for scheduling data-dependent tasks in heterogeneous environments. In: Proceedings of Heterogeneous Computing Workshop, Honolulu, pp. 98–104 (1996)
Spallation Neutron Source (SNS): Retrieved from http://www.sns.gov (2010)
Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.): Workflows for e-Science: Scientific Workflows for Grids. Springer, Berlin (2007)
Terascale Supernova Initiative (TSI): The office of science data-management challenge, Mar.-May 2004. Report from the DOE Office of Science Data-Management Workshops. Technical Report SLAC-R-782, Stanford Linear Accelerator Center, Stanford. Retrieved from http://www.phy.ornl.gov/tsi (2010)
Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance effective and low complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)
Ullman, J.D.: NP-complete scheduling problems. J. Comput. Syst. Sci. 10(3), 384–393 (1975)
Wang, L., Siege, H.J., Roychowdhury, V.P., Maciejewski, A.A.: Task matching and scheduling in heterogeneous computing environments using a genetic-algorithm-based approach. J. Parallel Distrib. Comput. 47, 8–22 (1997)
Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the ASKALON grid environment. ACM SIGMOD Record J. 34(3), 56–62 (2005)
Worldwide LHC Computing Grid (WLCG): Retrieved from http://lcg.web.cern.ch/LCG (2010)
Wu, Q., Gu, Y., Zhu, M., Rao, N.S.V.: Optimizing network performance of computing pipelines in distributed environments. In: Proceedings of the 2008 IEEE International Parallel & Distributed Processing Symposium IPDPS 2008
Wu, Q., Rao, N.S.V.: On transport daemons for small collaborative applications over wide-area networks. In: Proceedings of the 24th IEEE International Performance Computing and Communications Conference, Phoenix, pp. 159–166 (2005)
Wu, Q., Zhu, M., Gu, Y., Rao, N.S.V.: System design and algorithmic development for computational steering in distributed environments. IEEE Trans. Parallel Distrib. Syst. 21(4), 438–451 (2009)
Zhu, Y., Li, B.: Overlay network with linear capacity constraints. IEEE Trans. Parallel Distrib. Syst. 19, 159–173 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this chapter
Cite this chapter
Wu, Q., Gu, Y. (2011). Performance Analysis and Optimization of Linear Workflows in Heterogeneous Network Environments. In: Preve, N. (eds) Grid Computing. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-0-85729-676-4_4
Download citation
DOI: https://doi.org/10.1007/978-0-85729-676-4_4
Published:
Publisher Name: Springer, London
Print ISBN: 978-0-85729-675-7
Online ISBN: 978-0-85729-676-4
eBook Packages: Computer ScienceComputer Science (R0)