Skip to main content

Availability and Network-Aware MapReduce Task Scheduling over the Internet

  • Conference paper
  • First Online:
Book cover Algorithms and Architectures for Parallel Processing (ICA3PP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9528))

  • 1686 Accesses

Abstract

MapReduce offers an ease-of-use programming paradigm for processing large datasets. In our previous work, we have designed a MapReduce framework called BitDew-MapReduce for desktop grid and volunteer computing environment, that allows nonexpert users to run data-intensive MapReduce jobs on top of volunteer resources over the Internet. However, network distance and resource availability have great impact on MapReduce applications running over the Internet. To address this, an availability and network-aware MapReduce framework over the Internet is proposed. Simulation results show that the MapReduce job response time could be decreased by 27.15 %, thanks to Naive Bayes Classifier-based availability prediction and landmark-based network estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://bitdew.gforge.inria.fr.

  2. 2.

    SETI@home is a global scientific experiment that uses Internet-connected computers in the Search for Extraterrestrial Intelligence. SETI@home traces can be downloaded from Failure Trace Archive (FTA), http://fta.scem.uws.edu.au/.

References

  1. Anderson, D.P.: Boinc: a system for public-resource computing and storage. In: GRID, pp. 4–10. IEEE (2004)

    Google Scholar 

  2. Costa, F., Silva, J.N., Veiga, L., Ferreira, P.: Large-scale volunteer computing over the internet. J. Internet Serv. Appl. 3(3), 329–346 (2012)

    Article  Google Scholar 

  3. Costa, F., Silva, L.M., Fedak, G., Kelley, I.: Optimizing data distribution in desktop grid platforms. Parallel Process. Lett. 18(3), 391–410 (2008)

    Article  MathSciNet  Google Scholar 

  4. Costa, F., Veiga, L., Ferreira, P.: Internet-scale support for map-reduce processing. J. Internet Serv. Appl. 4(1), 1–17 (2013)

    Article  Google Scholar 

  5. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  6. Fedak, G., He, H., Cappello, F.: Bitdew: a data management and distribution service with multi-protocol file transfer and metadata abstraction. J. Netw. Comput. Appl. 32(5), 961–975 (2009)

    Article  Google Scholar 

  7. Jin, H., Yang, X., Sun, X.H., Raicu, I.: Adapt: Availability-aware mapreduce data placement for non-dedicated distributed computing. In: ICDCS, pp. 516–525. IEEE (2012)

    Google Scholar 

  8. Lee, K., Figueiredo, R.J.O.: Mapreduce on opportunistic resources leveraging resource availability. In: CloudCom, pp. 435–442. IEEE (2012)

    Google Scholar 

  9. Lin, H., Ma, X., Feng, W.-C.: Reliable mapreduce computing on opportunistic resources. Cluster Comput. 15(2), 145–161 (2012)

    Article  Google Scholar 

  10. Lu, L., Jin, H., Shi, X., Fedak, G.: Assessing mapreduce for internet computing: a comparison of hadoop and bitdew-mapreduce. In: GRID, pp. 76–84. IEEE Computer Society (2012)

    Google Scholar 

  11. Marozzo, F., Talia, D., Trunfio, P.: P2p-mapreduce: parallel data processing in dynamic cloud environments. J. Comput. Syst. Sci. 78(5), 1382–1402 (2012)

    Article  Google Scholar 

  12. Medina, A., Lakhina, A., Matta, I., Byers, J.W.: Brite: an approach to universal topology generation. In: MASCOTS, IEEE Computer Society (2001)

    Google Scholar 

  13. Moca, M., Silaghi, G.C., Fedak, G.: Distributed results checking for mapreduce in volunteer computing. In: IPDPS Workshops, pp. 1847–1854. IEEE (2011)

    Google Scholar 

  14. Ratnasamy, S., Handley, M., Karp, R.M., Shenker, S.: Topologically-aware overlay construction and server selection. In: INFOCOM (2002)

    Google Scholar 

  15. Song, S., Keleher, P.J., Bhattacharjee, B., Sussman, A.: Decentralized, accurate, and low-cost network bandwidth prediction. In: INFOCOM, pp. 6–10. IEEE (2011)

    Google Scholar 

  16. Tang, B., He, H., Fedak, G.: Parallel data processing in dynamic hybrid computing environment using mapreduce. In: ICA3PP (2014)

    Google Scholar 

  17. Tang, B., Moca, M., Chevalier, S., He, H., Fedak, G.: Towards mapreduce for desktop grid computing. In: 3PGCIC, pp. 193–200. IEEE Computer Society (2010)

    Google Scholar 

  18. Wei, B., Fedak, G., Cappello, F.: Towards efficient data distribution on computational desktop grids with bittorrent. Future Gener. Comp. Syst. 23(8), 983–989 (2007)

    Article  Google Scholar 

Download references

Acknowledgement

This work is supported by the “100 Talents Project” of Computer Network Information Center of Chinese Academy of Sciences under grant no. 1101002001, and the Natural Science Foundation of Hunan Province under grant no. 2015JJ3071, and Scientific Research Fund of Hunan Provincial Education Department under grant no. 12C0121, 11C0689 and 11C0535.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bing Tang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Tang, B., Xie, Q., He, H., Fedak, G. (2015). Availability and Network-Aware MapReduce Task Scheduling over the Internet. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9528. Springer, Cham. https://doi.org/10.1007/978-3-319-27119-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27119-4_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27118-7

  • Online ISBN: 978-3-319-27119-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics