Skip to main content

Distributed File System

  • Chapter
  • 13k Accesses

Part of the book series: Integrated Series in Information Systems ((ISIS,volume 36))

Abstract

The main objective of this chapter is to provide information and guidance for building a Hadoop distributed file system to address the big data classification problem. This system can help one to implement, test, and evaluate various machine-learning techniques presented in this book for learning purposes. The objectives include a detailed explanation of the Hadoop framework and the Hadoop system, the presentation of the Internet resources that can help you build a virtual machine-based Hadoop distributed file system with the R programming platform, and the establishment of an easy-to-follow, step-by-step instruction to build the RevolutionAnalytics’ RHadoop system for your big data computing environment. The objective also includes the presentation of simple examples to test the system to ensure the Hadoop system works. A brief discussion on setting up a multi node Hadoop system is also presented.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. T. White. “Hadoop: the definitive guide.” O’Reilly Inc, 2009.

    Google Scholar 

  2. http://en.wikipedia.org/wiki/Apache_Hadoop

  3. D. Borthakur. “The hadoop distributed file system: Architecture and design.” Hadoop Project Website 11: 21, 2007.

    Google Scholar 

  4. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. “The hadoop distributed file system.” In Proceedings of the IEEE Symposium on Mass Storage Systems and Technologies, pp. 1–10, 2010.

    Google Scholar 

  5. J. Dean, and S. Ghemawat, “MapReduce: simplified data processing on large clusters.” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.

    Article  Google Scholar 

  6. J. Dean, and S. Ghemawat. “MapReduce: a flexible data processing tool.” Communications of the ACM, vol. 53, no. 1, pp. 72–77, 2010.

    Article  Google Scholar 

  7. https://www.virtualbox.org/wiki/Downloads

  8. http://www.cloudera.com

  9. https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads

  10. http://www.rstudio.com/products/rstudio/download

  11. http://www.mathworks.com/products/matlab/index-b.html

  12. http://cran.r-project.org/bin/linux/ubuntu/README

  13. http://www.vmware.com/products/player

  14. http://www.ubuntu.com/download/desktop

  15. http://wiki.apache.org/hadoop/Hadoop2OnWindows

  16. https://github.com/RevolutionAnalytics/rmr2/tree/master/build

  17. https://github.com/RevolutionAnalytics/rhdfs/tree/master/build

  18. https://www.youtube.com/watch?v=hK-oggHEetc

  19. http://www.meetup.com/Learning-Machine-Learning-by-Example/pages/Installing_R_and_RHadoop/

  20. http://bighadoop.wordpress.com/2013/02/25/r-and-hadoop-data-analysis-rhadoop/

  21. http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/

Download references

Acknowledgements

I would like to thank my graduate student Sumanth Reddy Yanala for helping to produce the drawing in Fig. 4.1. The information and discussions on “wrapletters” available at http://www.latex-community.org/forum/viewtopic.php?f=44&t=3798 helped the formatting of several long continuous text, like Uniform Resource Locator (URL), in this book.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this chapter

Cite this chapter

Suthaharan, S. (2016). Distributed File System. In: Machine Learning Models and Algorithms for Big Data Classification. Integrated Series in Information Systems, vol 36. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7641-3_4

Download citation

Publish with us

Policies and ethics