Skip to main content

Performance Analysis and Optimization of Linear Workflows in Heterogeneous Network Environments

  • Chapter
  • First Online:
Grid Computing

Part of the book series: Computer Communications and Networks ((CCN))

Abstract

The advent of next-generation computation-intensive applications in various science fields is pushing computing demands to go far beyond the capability of traditional computing solutions based on standalone PCs. The availability of today’s largest clusters, grids, and supercomputers expedites the development of robust problem-solving environments that marshal those high-performance computing and networking resources and presents a great opportunity to manage and execute large-scale computing workflows for collaborative scientific research. Supporting such scientific workflows and optimizing their end-to-end performance in wide-area networks is crucial to ensuring the success of large-scale distributed scientific applications. We consider a special type of pipeline workflows comprised of a set of linearly arranged modules, and formulate and categorize pipeline mapping problems into six classes with two optimization objectives, i.e., minimum end-to-end delay and maximum frame rate, and three network constraints, i.e., no, contiguous, and arbitrary node reuse. We design a dynamic programming-based optimal solution to the problem of minimum end-to-end delay with arbitrary node reuse and prove the NP-completeness of the rest five problems, for each of which, a heuristic algorithm based on a similar optimization procedure is proposed. These heuristics are implemented and tested on a large set of simulated networks of various scales and their performance superiority is illustrated by extensive simulation results in comparison with existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwalla, B., Ahmed, N., Hilley, D., Ramachandran, U.: Streamline: A scheduling heuristic for streaming application on the grid. In: Proceedings of the 13th Multimedia Computing and Networking Conference, San Jose (2006)

    Google Scholar 

  2. Ahmed, I., Kwok, Y.: On exploiting task duplication in parallel program scheduling. IEEE Trans. Parallel Distrib. Syst. 9, 872–892 (1998)

    Article  Google Scholar 

  3. Annie, S.W., Yu, H., Jin, S., Lin, K.C.: An incremental genetic algorithm approach to multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 15, 824–834 (2004)

    Article  Google Scholar 

  4. Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., Protasi, M.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approxima­bility Properties. Springer, Berlin (1999)

    MATH  Google Scholar 

  5. Bajaj, R., Agrawal, D.P.: Improving scheduling of tasks in a heterogeneous environment. IEEE Trans. Parallel Distrib. Syst. 15, 107–118 (2004)

    Article  Google Scholar 

  6. Bashir, A.F., Susarla, V., Vairavan, K.: A statistical study of the performance of a task scheduling algorithm. IEEE Trans. Comput. 32(12), 774–777 (1975)

    Google Scholar 

  7. Benoit, A., Hakem, M., Robert, Y.: Optimizing the latency of streaming applications under throughput and reliability constraints. In: Proceedings of the 2009 International Conference on Parallel Processing, Vienna, pp. 325–332, (2009)

    Google Scholar 

  8. Benoit, A., Robert, Y.: Mapping pipeline skeletons onto heterogeneous platforms. J. Parallel Distrib. Comput. 68(6), 790–808 (2008)

    Article  Google Scholar 

  9. Boeres, C., Filho, J.V., Rebello, V.E.F.: A cluster-based strategy for scheduling task on heterogeneous processors. In: Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing, Foz Do Iguacu, pp. 214–221 (2004)

    Google Scholar 

  10. Bozdag, D., Catalyurek, U., Ozguner, F.: A task duplication based bottom-up scheduling algorithm for heterogeneous environments. In: Proceedings of the 20th International Parallel and Distributed Processing Symposium, Rhodes Island, pp. 12 (2006). doi:10.1109/IPDPS.2006.1639389

  11. Braun, T.D., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B., Hensgen, D., Freund, R.F.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61(6), 810–837 (2001)

    Article  Google Scholar 

  12. Buyya, R., Abramson, D., Giddy, J.: Nimrod/G: An architecture for a resource management and scheduling system in a global computational grid. In: Proceedings of the 4th International Conference/Exhibition on the High Performance Computing in the Asia-Pacific Region, vol. 1, Beijing, pp. 283–289 (2000)

    Google Scholar 

  13. Cao, J., Jarvis, S.A., Saini, S., Nudd, G.R.: GridFlow: Workflow management for grid computing. In: Proceedings of the 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, Tokyo, pp. 198–205 (2003)

    Google Scholar 

  14. Chatterjee, S., Strosnider, J.: Distributed pipeline scheduling: End-to-end analysis of heterogeneous, multi-resource real-time systems. In: Proceedings of the 15th International Conference on Distributed Computing Systems, Washington, DC, pp. 204–211 (1995)

    Google Scholar 

  15. Chaudhary, V., Aggarwal, J.K.: A generalized scheme for mapping parallel algorithms. IEEE Trans. Parallel Distrib. Syst. 4(3), 328–346 (1993)

    Article  Google Scholar 

  16. Chen, L., Agrawal, G.: Resource allocation in a middleware for streaming data. In Proceedings of the 2nd Workshop on Middleware for Grid Computing (2004). doi:10.1145/1028493.1028494

    Google Scholar 

  17. Choi, S.Y., Turner, J.: Configuring sessions in programmable networks with capacity constraints. In: Proceedings of IEEE International Conference on Communications, Anchorage, pp. 823–829 (2003)

    Google Scholar 

  18. Choi, S.Y., Turner, J., Wolf, T.: Configuring sessions in programmable networks. In: Procee­dings of IEEE INFOCOM, Anchorage, pp. 60–66 (2001)

    Google Scholar 

  19. Climate and Carbon Research Institute (CCR): Retrieved from http://www.ccs.ornl.gov/CCR (2010)

  20. Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: An improved algorithm for matching large graphs. In: Proceedings of the 3rd IAPR-TC-15 International Workshop on Graph-Based Representations, Venice (2001)

    Google Scholar 

  21. Deelman, E., Callaghan, S., Field, E., Francoeur, H., Graves, R., Gupta, N., Gupta, V., Jordan, T.H., Kesselman, C., Maechling, P., Mehringer, J., Mehta, G., Okaya, D., Vahi, K., Zhao, L.: Managing large-scale workflow execution from resource provisioning to provenance tracking: The cybershake example. In: Proceedings of the e-Science Conference, Amsterdam (2006). doi:10.1109/E-SCIENCE.2006.99

    Google Scholar 

  22. Deelman, E., Singh, G., Su, M., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: A framework for mapping complex scientific workflows onto distributed systems. J. Sci. Program. 13, 219–237 (2005)

    Google Scholar 

  23. Dogan, A., Özgüner, F.: Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 308–323 (2002)

    Article  Google Scholar 

  24. Earth System Grid (ESG): Retrieved from http://www.earthsystemgrid.org (2010)

  25. Fortune, S., Hopcroft, J., Wyllie, J.: The directed subgraph homeomorphism problem. Theor. Comput. Sci. 10, 111–121 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  26. Foster, I.T.: Globus toolkit version 4: Software for service-oriented systems. J. Comput. Sci. Technol. 21(4), 513–520 (2006)

    Article  Google Scholar 

  27. Foster, I.T., Kesselman, C., Tuecke, S.: The anatomy of the grid – Enabling scalable virtual organizations. Int. J. Supercomput. Appl. 15(3), 200–222 (2001)

    Article  Google Scholar 

  28. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, New York (1979)

    MATH  Google Scholar 

  29. Gerasoulis, A., Yang, T.: A comparison of clustering heuristics for scheduling DAGs on multiprocessors. J. Parellel Distrib. Comput. 16(4), 276–291 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  30. González, D., Almeida, F., Moreno, L., Rodríguez, C.: Towards the automatic optimal mapping of pipeline algorithms. J. Parallel Comput. 29(2), 241–254 (2003)

    Article  Google Scholar 

  31. Gu, Y., Wu, Q.: Optimizing distributed computing workflows in heterogeneous network environments. In: Proceedings of the 11th International Conference on Distributed Computing and Networking, San Jose (2010). doi: 10.1007/978-3-642-11322-2_17

  32. Gu, Y., Wu, Q., Benoit, A., Robert, Y.: Optimizing end-to-end performance of distributed applications with linear computing pipelines. In: Proceedings of the 15th International Conference on Parallel and Distributed Systems, Shenzhen (2009)

    Google Scholar 

  33. Ilavarasan, E., Thambidurai, P.: Low complexity performance effective task scheduling algorithm for heterogeneous computing environments. J. Comput. Sci. 3(2), 94–103 (2007)

    Article  Google Scholar 

  34. Johnston, W.E.: Computational and data grids in large-scale science and engineering. J. Future Generation Comput. Syst. 18(8), 1085–1100 (2002)

    Article  MATH  Google Scholar 

  35. Kwok, Y.K., Ahmad, I.: Dynamic critical-path scheduling: An effective technique for allocating task graph to multiprocessors. IEEE Trans. Parallel Distrib. Syst. 7(5), 506–521 (1996)

    Article  Google Scholar 

  36. Litzkow, M., Livny, M., Mutka, M.: Condor – A hunter of idle workstations. In: Proceedings of the 8th International Conference on Distributed Computing Systems, San Jose, pp. 104–111 (1988)

    Google Scholar 

  37. Ma, T., Buyya, R.: Critical-path and priority based algorithms for scheduling workflows with parameter sweep tasks on global grids. In: Proceedings of the 17th International Symposium on Computer Architecture on HPC, Rio de Janeiro, pp. 251–258 (2005)

    Google Scholar 

  38. McCreary, C., Khan, A.A., Thompson, J.J., McArdle, M.E.: A comparison of heuristics for scheduling DAGs on multiprocessors. In: Proceedings of the 8th International Symposium on Parallel Processing, Cancun, pp. 446–451 (1994)

    Google Scholar 

  39. McDermott, W.J., Maluf, D.A., Gawdiak, Y., Tran, P.B.: Airport simulations using distributed computational resources. DOI: 10.4271/2001-01-2650 (2001)

    Google Scholar 

  40. Messmer, B.T.: Efficient graph matching algorithms for preprocessed model graphs. Ph.D. thesis, Institute of Computer Science and Applied Mathematics, University of Bern, Bern, Swtzerland (1996)

    Google Scholar 

  41. Mezzacappa, A.: Scidac Scientific discovery through advanced computing. J. Phys. Conf. Ser. 16, 536–540 (2005)

    Article  Google Scholar 

  42. NSF Grand Challenges in eScience Workshop, 2001 (NSF): Retrieved from http://www2.evl.uic.edu/NSF/index.html (2010)

  43. Open Science Grid (OSG): Retrieved from http://www.opensciencegrid.org (2010)

  44. Rahman, M., Venugopal, S., Buyya, R.: A dynamic critical path algorithm for scheduling scientific workflow applications on global grids. In: Proceedings of the 3rd IEEE International Conference on e-Science and Grid Computing, Bangalore, pp 35–42 (2007)

    Google Scholar 

  45. Ranaweera, A., Agrawal, D.P.: A task duplication based algorithm for heterogeneous systems. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium, Los Alamitos, pp. 445–450 (2000)

    Google Scholar 

  46. Relativistic Heavy Ion Collider (LHC): Retrieved from http://www.bnl.gov/rhic (2010)

  47. Sekhar, A., Manoj, B.S., Murthy, C.S.R.: A state-space search approach for optimizing reliabi­lity and cost of execution in distributed sensor networks. In: Proceedings of International Workshop on Distributed Computing, Kharagpur, pp. 63–74 (2005)

    Google Scholar 

  48. Shirazi, B., Wang, M., Pathak, G.: Analysis and evaluation of heuristic methods for static scheduling. J. Parallel Distrib. Comput. 10, 222–232 (1990)

    Article  Google Scholar 

  49. Shroff, P., Watson, D.W., Flann, N.S., Freund, R.F.: Genetic simulated annealing for scheduling data-dependent tasks in heterogeneous environments. In: Proceedings of Heterogeneous Computing Workshop, Honolulu, pp. 98–104 (1996)

    Google Scholar 

  50. Spallation Neutron Source (SNS): Retrieved from http://www.sns.gov (2010)

  51. Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.): Workflows for e-Science: Scientific Workflows for Grids. Springer, Berlin (2007)

    Google Scholar 

  52. Terascale Supernova Initiative (TSI): The office of science data-management challenge, Mar.-May 2004. Report from the DOE Office of Science Data-Management Workshops. Technical Report SLAC-R-782, Stanford Linear Accelerator Center, Stanford. Retrieved from http://www.phy.ornl.gov/tsi (2010)

  53. Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance effective and low complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)

    Article  Google Scholar 

  54. Ullman, J.D.: NP-complete scheduling problems. J. Comput. Syst. Sci. 10(3), 384–393 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  55. Wang, L., Siege, H.J., Roychowdhury, V.P., Maciejewski, A.A.: Task matching and scheduling in heterogeneous computing environments using a genetic-algorithm-based approach. J. Parallel Distrib. Comput. 47, 8–22 (1997)

    Article  Google Scholar 

  56. Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the ASKALON grid environment. ACM SIGMOD Record J. 34(3), 56–62 (2005)

    Article  Google Scholar 

  57. Worldwide LHC Computing Grid (WLCG): Retrieved from http://lcg.web.cern.ch/LCG (2010)

  58. Wu, Q., Gu, Y., Zhu, M., Rao, N.S.V.: Optimizing network performance of computing pipelines in distributed environments. In: Proceedings of the 2008 IEEE International Parallel & Distributed Processing Symposium IPDPS 2008

    Google Scholar 

  59. Wu, Q., Rao, N.S.V.: On transport daemons for small collaborative applications over wide-area networks. In: Proceedings of the 24th IEEE International Performance Computing and Communications Conference, Phoenix, pp. 159–166 (2005)

    Google Scholar 

  60. Wu, Q., Zhu, M., Gu, Y., Rao, N.S.V.: System design and algorithmic development for computational steering in distributed environments. IEEE Trans. Parallel Distrib. Syst. 21(4), 438–451 (2009)

    Article  Google Scholar 

  61. Zhu, Y., Li, B.: Overlay network with linear capacity constraints. IEEE Trans. Parallel Distrib. Syst. 19, 159–173 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qishi Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this chapter

Cite this chapter

Wu, Q., Gu, Y. (2011). Performance Analysis and Optimization of Linear Workflows in Heterogeneous Network Environments. In: Preve, N. (eds) Grid Computing. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-0-85729-676-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-0-85729-676-4_4

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-0-85729-675-7

  • Online ISBN: 978-0-85729-676-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics