Performance Analysis and Optimization of Linear Workflows in Heterogeneous Network Environments

Wu, Qishi; Gu, Yi

doi:10.1007/978-0-85729-676-4_4

Qishi Wu² &
Yi Gu²

Part of the book series: Computer Communications and Networks ((CCN))

671 Accesses
2 Citations

Abstract

The advent of next-generation computation-intensive applications in various science fields is pushing computing demands to go far beyond the capability of traditional computing solutions based on standalone PCs. The availability of today’s largest clusters, grids, and supercomputers expedites the development of robust problem-solving environments that marshal those high-performance computing and networking resources and presents a great opportunity to manage and execute large-scale computing workflows for collaborative scientific research. Supporting such scientific workflows and optimizing their end-to-end performance in wide-area networks is crucial to ensuring the success of large-scale distributed scientific applications. We consider a special type of pipeline workflows comprised of a set of linearly arranged modules, and formulate and categorize pipeline mapping problems into six classes with two optimization objectives, i.e., minimum end-to-end delay and maximum frame rate, and three network constraints, i.e., no, contiguous, and arbitrary node reuse. We design a dynamic programming-based optimal solution to the problem of minimum end-to-end delay with arbitrary node reuse and prove the NP-completeness of the rest five problems, for each of which, a heuristic algorithm based on a similar optimization procedure is proposed. These heuristics are implemented and tested on a large set of simulated networks of various scales and their performance superiority is illustrated by extensive simulation results in comparison with existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agarwalla, B., Ahmed, N., Hilley, D., Ramachandran, U.: Streamline: A scheduling heuristic for streaming application on the grid. In: Proceedings of the 13th Multimedia Computing and Networking Conference, San Jose (2006)
Google Scholar
Ahmed, I., Kwok, Y.: On exploiting task duplication in parallel program scheduling. IEEE Trans. Parallel Distrib. Syst. 9, 872–892 (1998)
Article Google Scholar
Annie, S.W., Yu, H., Jin, S., Lin, K.C.: An incremental genetic algorithm approach to multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 15, 824–834 (2004)
Article Google Scholar
Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., Protasi, M.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, Berlin (1999)
MATH Google Scholar
Bajaj, R., Agrawal, D.P.: Improving scheduling of tasks in a heterogeneous environment. IEEE Trans. Parallel Distrib. Syst. 15, 107–118 (2004)
Article Google Scholar
Bashir, A.F., Susarla, V., Vairavan, K.: A statistical study of the performance of a task scheduling algorithm. IEEE Trans. Comput. 32(12), 774–777 (1975)
Google Scholar
Benoit, A., Hakem, M., Robert, Y.: Optimizing the latency of streaming applications under throughput and reliability constraints. In: Proceedings of the 2009 International Conference on Parallel Processing, Vienna, pp. 325–332, (2009)
Google Scholar
Benoit, A., Robert, Y.: Mapping pipeline skeletons onto heterogeneous platforms. J. Parallel Distrib. Comput. 68(6), 790–808 (2008)
Article Google Scholar
Boeres, C., Filho, J.V., Rebello, V.E.F.: A cluster-based strategy for scheduling task on heterogeneous processors. In: Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing, Foz Do Iguacu, pp. 214–221 (2004)
Google Scholar
Bozdag, D., Catalyurek, U., Ozguner, F.: A task duplication based bottom-up scheduling algorithm for heterogeneous environments. In: Proceedings of the 20th International Parallel and Distributed Processing Symposium, Rhodes Island, pp. 12 (2006). doi:10.1109/IPDPS.2006.1639389
Braun, T.D., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B., Hensgen, D., Freund, R.F.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61(6), 810–837 (2001)
Article Google Scholar
Buyya, R., Abramson, D., Giddy, J.: Nimrod/G: An architecture for a resource management and scheduling system in a global computational grid. In: Proceedings of the 4th International Conference/Exhibition on the High Performance Computing in the Asia-Pacific Region, vol. 1, Beijing, pp. 283–289 (2000)
Google Scholar
Cao, J., Jarvis, S.A., Saini, S., Nudd, G.R.: GridFlow: Workflow management for grid computing. In: Proceedings of the 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, Tokyo, pp. 198–205 (2003)
Google Scholar
Chatterjee, S., Strosnider, J.: Distributed pipeline scheduling: End-to-end analysis of heterogeneous, multi-resource real-time systems. In: Proceedings of the 15th International Conference on Distributed Computing Systems, Washington, DC, pp. 204–211 (1995)
Google Scholar
Chaudhary, V., Aggarwal, J.K.: A generalized scheme for mapping parallel algorithms. IEEE Trans. Parallel Distrib. Syst. 4(3), 328–346 (1993)
Article Google Scholar
Chen, L., Agrawal, G.: Resource allocation in a middleware for streaming data. In Proceedings of the 2nd Workshop on Middleware for Grid Computing (2004). doi:10.1145/1028493.1028494
Google Scholar
Choi, S.Y., Turner, J.: Configuring sessions in programmable networks with capacity constraints. In: Proceedings of IEEE International Conference on Communications, Anchorage, pp. 823–829 (2003)
Google Scholar
Choi, S.Y., Turner, J., Wolf, T.: Configuring sessions in programmable networks. In: Proceedings of IEEE INFOCOM, Anchorage, pp. 60–66 (2001)
Google Scholar
Climate and Carbon Research Institute (CCR): Retrieved from http://www.ccs.ornl.gov/CCR (2010)
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: An improved algorithm for matching large graphs. In: Proceedings of the 3rd IAPR-TC-15 International Workshop on Graph-Based Representations, Venice (2001)
Google Scholar
Deelman, E., Callaghan, S., Field, E., Francoeur, H., Graves, R., Gupta, N., Gupta, V., Jordan, T.H., Kesselman, C., Maechling, P., Mehringer, J., Mehta, G., Okaya, D., Vahi, K., Zhao, L.: Managing large-scale workflow execution from resource provisioning to provenance tracking: The cybershake example. In: Proceedings of the e-Science Conference, Amsterdam (2006). doi:10.1109/E-SCIENCE.2006.99
Google Scholar
Deelman, E., Singh, G., Su, M., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: A framework for mapping complex scientific workflows onto distributed systems. J. Sci. Program. 13, 219–237 (2005)
Google Scholar
Dogan, A., Özgüner, F.: Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 308–323 (2002)
Article Google Scholar
Earth System Grid (ESG): Retrieved from http://www.earthsystemgrid.org (2010)
Fortune, S., Hopcroft, J., Wyllie, J.: The directed subgraph homeomorphism problem. Theor. Comput. Sci. 10, 111–121 (1980)
Article MATH MathSciNet Google Scholar
Foster, I.T.: Globus toolkit version 4: Software for service-oriented systems. J. Comput. Sci. Technol. 21(4), 513–520 (2006)
Article Google Scholar
Foster, I.T., Kesselman, C., Tuecke, S.: The anatomy of the grid – Enabling scalable virtual organizations. Int. J. Supercomput. Appl. 15(3), 200–222 (2001)
Article Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, New York (1979)
MATH Google Scholar
Gerasoulis, A., Yang, T.: A comparison of clustering heuristics for scheduling DAGs on multiprocessors. J. Parellel Distrib. Comput. 16(4), 276–291 (1992)
Article MATH MathSciNet Google Scholar
González, D., Almeida, F., Moreno, L., Rodríguez, C.: Towards the automatic optimal mapping of pipeline algorithms. J. Parallel Comput. 29(2), 241–254 (2003)
Article Google Scholar
Gu, Y., Wu, Q.: Optimizing distributed computing workflows in heterogeneous network environments. In: Proceedings of the 11th International Conference on Distributed Computing and Networking, San Jose (2010). doi: 10.1007/978-3-642-11322-2_17
Gu, Y., Wu, Q., Benoit, A., Robert, Y.: Optimizing end-to-end performance of distributed applications with linear computing pipelines. In: Proceedings of the 15th International Conference on Parallel and Distributed Systems, Shenzhen (2009)
Google Scholar
Ilavarasan, E., Thambidurai, P.: Low complexity performance effective task scheduling algorithm for heterogeneous computing environments. J. Comput. Sci. 3(2), 94–103 (2007)
Article Google Scholar
Johnston, W.E.: Computational and data grids in large-scale science and engineering. J. Future Generation Comput. Syst. 18(8), 1085–1100 (2002)
Article MATH Google Scholar
Kwok, Y.K., Ahmad, I.: Dynamic critical-path scheduling: An effective technique for allocating task graph to multiprocessors. IEEE Trans. Parallel Distrib. Syst. 7(5), 506–521 (1996)
Article Google Scholar
Litzkow, M., Livny, M., Mutka, M.: Condor – A hunter of idle workstations. In: Proceedings of the 8th International Conference on Distributed Computing Systems, San Jose, pp. 104–111 (1988)
Google Scholar
Ma, T., Buyya, R.: Critical-path and priority based algorithms for scheduling workflows with parameter sweep tasks on global grids. In: Proceedings of the 17th International Symposium on Computer Architecture on HPC, Rio de Janeiro, pp. 251–258 (2005)
Google Scholar
McCreary, C., Khan, A.A., Thompson, J.J., McArdle, M.E.: A comparison of heuristics for scheduling DAGs on multiprocessors. In: Proceedings of the 8th International Symposium on Parallel Processing, Cancun, pp. 446–451 (1994)
Google Scholar
McDermott, W.J., Maluf, D.A., Gawdiak, Y., Tran, P.B.: Airport simulations using distributed computational resources. DOI: 10.4271/2001-01-2650 (2001)
Google Scholar
Messmer, B.T.: Efficient graph matching algorithms for preprocessed model graphs. Ph.D. thesis, Institute of Computer Science and Applied Mathematics, University of Bern, Bern, Swtzerland (1996)
Google Scholar
Mezzacappa, A.: Scidac Scientific discovery through advanced computing. J. Phys. Conf. Ser. 16, 536–540 (2005)
Article Google Scholar
NSF Grand Challenges in eScience Workshop, 2001 (NSF): Retrieved from http://www2.evl.uic.edu/NSF/index.html (2010)
Open Science Grid (OSG): Retrieved from http://www.opensciencegrid.org (2010)
Rahman, M., Venugopal, S., Buyya, R.: A dynamic critical path algorithm for scheduling scientific workflow applications on global grids. In: Proceedings of the 3rd IEEE International Conference on e-Science and Grid Computing, Bangalore, pp 35–42 (2007)
Google Scholar
Ranaweera, A., Agrawal, D.P.: A task duplication based algorithm for heterogeneous systems. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium, Los Alamitos, pp. 445–450 (2000)
Google Scholar
Relativistic Heavy Ion Collider (LHC): Retrieved from http://www.bnl.gov/rhic (2010)
Sekhar, A., Manoj, B.S., Murthy, C.S.R.: A state-space search approach for optimizing reliability and cost of execution in distributed sensor networks. In: Proceedings of International Workshop on Distributed Computing, Kharagpur, pp. 63–74 (2005)
Google Scholar
Shirazi, B., Wang, M., Pathak, G.: Analysis and evaluation of heuristic methods for static scheduling. J. Parallel Distrib. Comput. 10, 222–232 (1990)
Article Google Scholar
Shroff, P., Watson, D.W., Flann, N.S., Freund, R.F.: Genetic simulated annealing for scheduling data-dependent tasks in heterogeneous environments. In: Proceedings of Heterogeneous Computing Workshop, Honolulu, pp. 98–104 (1996)
Google Scholar
Spallation Neutron Source (SNS): Retrieved from http://www.sns.gov (2010)
Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.): Workflows for e-Science: Scientific Workflows for Grids. Springer, Berlin (2007)
Google Scholar
Terascale Supernova Initiative (TSI): The office of science data-management challenge, Mar.-May 2004. Report from the DOE Office of Science Data-Management Workshops. Technical Report SLAC-R-782, Stanford Linear Accelerator Center, Stanford. Retrieved from http://www.phy.ornl.gov/tsi (2010)
Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance effective and low complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)
Article Google Scholar
Ullman, J.D.: NP-complete scheduling problems. J. Comput. Syst. Sci. 10(3), 384–393 (1975)
Article MATH MathSciNet Google Scholar
Wang, L., Siege, H.J., Roychowdhury, V.P., Maciejewski, A.A.: Task matching and scheduling in heterogeneous computing environments using a genetic-algorithm-based approach. J. Parallel Distrib. Comput. 47, 8–22 (1997)
Article Google Scholar
Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the ASKALON grid environment. ACM SIGMOD Record J. 34(3), 56–62 (2005)
Article Google Scholar
Worldwide LHC Computing Grid (WLCG): Retrieved from http://lcg.web.cern.ch/LCG (2010)
Wu, Q., Gu, Y., Zhu, M., Rao, N.S.V.: Optimizing network performance of computing pipelines in distributed environments. In: Proceedings of the 2008 IEEE International Parallel & Distributed Processing Symposium IPDPS 2008
Google Scholar
Wu, Q., Rao, N.S.V.: On transport daemons for small collaborative applications over wide-area networks. In: Proceedings of the 24th IEEE International Performance Computing and Communications Conference, Phoenix, pp. 159–166 (2005)
Google Scholar
Wu, Q., Zhu, M., Gu, Y., Rao, N.S.V.: System design and algorithmic development for computational steering in distributed environments. IEEE Trans. Parallel Distrib. Syst. 21(4), 438–451 (2009)
Article Google Scholar
Zhu, Y., Li, B.: Overlay network with linear capacity constraints. IEEE Trans. Parallel Distrib. Syst. 19, 159–173 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Memphis, Memphis, TN, 38152, USA
Qishi Wu & Yi Gu

Authors

Qishi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Gu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qishi Wu .

Editor information

Editors and Affiliations

School of Electrical and Computer Eng., National Technical University of Athens, Iroon Polytechniou str. 9, Athens, 157 80, Greece
Nikolaos P. Preve

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wu, Q., Gu, Y. (2011). Performance Analysis and Optimization of Linear Workflows in Heterogeneous Network Environments. In: Preve, N. (eds) Grid Computing. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-0-85729-676-4_4

Download citation

DOI: https://doi.org/10.1007/978-0-85729-676-4_4
Published: 30 May 2011
Publisher Name: Springer, London
Print ISBN: 978-0-85729-675-7
Online ISBN: 978-0-85729-676-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics