Skip to main content

Research on Distributed File System with Hadoop

  • Conference paper
Network Computing and Information Security (NCIS 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 345))

Included in the following conference series:

  • 1525 Accesses

Abstract

This paper describes research in the use of Hadoop to develop applications.. This paper introduces the structure of Hadoop and describes the implementation of algorithms in our library. Hadoop is a top-level Apache project being built and used by a global community of contributors, written in the Java programming language. Yahoo! has been the largest contributor to the project, and uses Hadoop extensively across its businesses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. In: Proc. of ACM Symposium on Operating Systems Principles, Lake George, NY, pp. 29–43 (October 2003)

    Google Scholar 

  2. Junqueira, F.P., Reed, B.C.: The life and times of a zookeeper. In: Proc. of the 28th ACM Symposium on Principles of Distributed Computing, Calgary, AB, Canada, August 10-12 (2009)

    Google Scholar 

  3. Carns, P.H., Ligon III, W.B., Ross, R.B., Thakur, R.: PVFS: A parallel file system for Linux clusters. In: Proc. of 4th Annual Linux Showcase and Conference, pp. 317–327 (2000)

    Google Scholar 

  4. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proc. of the 6th Symposium on Operating Systems Design and Implementation, San Francisco CA (December 2004)

    Google Scholar 

  5. Weil, S., Brandt, S., Miller, E., Long, D., Maltzahn, C.: Ceph: A Scalable, High-Performance Distributed File System. In: Proc. of the 7th Symposium on Operating Systems Design and Implementation, Seattle, WA (November 2006)

    Google Scholar 

  6. Welch, B., Unangst, M., Abbasi, Z., Gibson, G., Mueller, B., Small, J., Zelenka, J., Zhou, B.: Scalable Performance of the Panasas Parallel file System. In: Proc. of the 6th USENIX Conference on File and Storage Technologies, San Jose, CA (February 2008)

    Google Scholar 

  7. Zhang, Z., Kulkarni, A., Ma, X., Zhou, Y.: Memory resource allocation for file system prefetching: from a supply chain management perspective. In: Proc. of the 4th ACM European Conf. on Computer Systems (EuroSys 2009), pp. 75–88. ACM Press, Germany (2009)

    Chapter  Google Scholar 

  8. White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc. (June 2009)

    Google Scholar 

  9. Dong, B., Zheng, Q., Qiao, M., Shu, J., Yang, J.: BlueSky Cloud Framework: An E-Learning Framework Embracing Cloud Computing. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) CloudCom 2009. LNCS, vol. 5931, pp. 577–582. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Soundararajan, G., Mihailescu, M., Amza, C.: Context-aware prefetching at the storage server. In: Proc. of the 2008 USENIX Annual Tech. Conf. (USENIX 2008), pp. 377–390. USENIX Association Press, Berkeley (2008)

    Google Scholar 

  11. Schmuck, F., Haskin, R.: GPFS: A Shared-Disk File System for Large Computing Clusters. In: Proc. of the 1st USENIX Conf. on File and Storage Technologies (FAST 2002), pp. 231–244. USENIX Association Press, Monterey (2002)

    Google Scholar 

  12. Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google File System. In: Proc. of the 19th ACM Symp. on Operating Systems Principles (SOSP 2003), pp. 29–43. ACM Press, Lake (2003)

    Chapter  Google Scholar 

  13. Li, M., Varki, E., Bhatia, S., Merchant, A.: TaP: Table-based Prefetching for Storage Caches. In: Proc. of the 6th USENIX Conf. on File and Storage Technologies (FAST 2008), pp. 81–96. USENIX Association Press, San Jose (2008)

    Google Scholar 

  14. Gill, B.S., Modha, D.S.: SARC: Sequential prefetching in adaptive replacement cache. In: Proc. of the 2005 USENIX Annual Tech. Conf. (USENIX 2005), pp. 293–308. USENIX Association Press, Anaheim (2005)

    Google Scholar 

  15. Gill, B.S., Bathen, L.A.D.: AMP: Adaptive Multistream Prefetching in a Shared Cache. In: Proc. of the 5th USENIX Conf. on File and Storage Technologies (FAST 2007), pp. 185–198. USENIX Association Press, San Jose (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xu, J., Liang, J. (2012). Research on Distributed File System with Hadoop. In: Lei, J., Wang, F.L., Li, M., Luo, Y. (eds) Network Computing and Information Security. NCIS 2012. Communications in Computer and Information Science, vol 345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35211-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35211-9_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35210-2

  • Online ISBN: 978-3-642-35211-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics