Skip to main content

File Systems and Access Technologies for the Large Scale Data Facility

  • Conference paper
  • First Online:
Remote Instrumentation for eScience and Related Aspects

Abstract

Research projects produce huge amounts of data, which have to be stored and analyzed immediately after the acquisition. Storing and analyzing of high data rates are normally not possible within the detectors and can be worse if several detectors with similar data rates are used within a project. In order to store the data for analysis, it has to be transferred on an appropriate infrastructure, where it is accessible at any time and from different clients. The Large Scale Data Facility (LSDF), which is currently developed at KIT, is designed to fulfill the requirements of data intensive scientific experiments or applications. Currently, the LSDF consists of a testbed installation for evaluating different technologies. From a user point of view, the LSDF is a huge data sink, providing in the initial state 6 PB of storage, and will be accessible via a couple of interfaces. As a user is not interested in learning dozens of APIs for accessing data a generic API, the ADALAPI, has been designed, providing unique interfaces for the transparent access to the LSDF over different technologies. The present contribution evaluates technologies useable for the development of the LSDF to meet the requirements of various scientific projects. Also, the ADALAPI and the first GUI based on it are introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. W. Allcock, J. Bresnahan, R. Kettimuthu, and M. Link, “The Globus Striped GridFTP Framework and Server,” in Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, November 2005, pp. 54–54.

    Google Scholar 

  2. J. Bresnahan, M. Link, R. Kettimuthu, D. Fraser, and I. Foster, “GridFTP Pipelining,” in Proceedings of the 2007 TeraGrid Conference, June 2007.

    Google Scholar 

  3. S. Hermann, H. Marten, and J. v. Wezel, “Operating a TIER1 centre as part of a grid environment,” in Proceedings of the Conference on Computing in High Energy and Nuclear Physics (CHEP 2006), February 2006.

    Google Scholar 

  4. T. S. Pettersson and P. Lefèvre, “The large hadron collider: conceptual design,” CERN, Geneva, Tech. Rep. CERN-AC-95-05 LHC, October 1995.

    Google Scholar 

  5. The D-Grid project. (2010, January) D-Grid-Initiative: D-Grid-Initiative. [Online]. Available: http://www.d-grid.de/index.php?

  6. Sun Microsystems, “Nfs: Network file system protocol specification,” The Internet Engineering Task Force, Tech. Rep. RFC 1094, March 1989.

    Google Scholar 

  7. The Apache Software Foundation. (2010, January) Welcome to Apache Hadoop! [Online]. Available: http://hadoop.apache.org/

  8. J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” in Proceedings of the 6th Symposium on Operating Systems Design and Implementation, 2004, pp. 137–149.

    Google Scholar 

  9. F. Schmuck and R. Haskin, “Gpfs: A shared-disk file system for large computing clusters,” in Proceedings of the 2002 Conference on File and Storage Technologies (FAST), 2002, pp. 231–244.

    Google Scholar 

  10. Microsoft Corporation. (2010, January) Microsoft SMB Protocol and CIFS Protocol Overview (Windows). [Online]. Available: http://msdn.microsoft.com/en-us/library/aa365233%28VS.85%29.aspx

  11. The Samba team. (2010, January) Samba - opening windows to a wider world. [Online]. Available: http://samba.org/

  12. J. H. Howard, M. L. Kazar, S. G. Menees, D. A. Nichols, M. Satyanarayanan, R. N. Sidebotham, and M. J. West, “Scale and performance in a distributed file system,” ACM Transactions on Computer Systems, vol. 6, no. 1, pp. 51–81, 1988.

    Article  Google Scholar 

  13. Gluster Software India. (2010, January) Gluster: Open Source Clustered Storage. Easy-to-Use Scale-Out File System. [Online]. Available: http://www.gluster.com/

  14. P. H. Carns and W. B. Ligon III and R. B. Ross and R. Thakur, “Pvfs: A parallel file system for linux clusters,” in Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317–327.

    Google Scholar 

  15. C. Baru, R. Moore, A. Rajasekar, and M. Wan, “The SDSC storage resource broker,” in CASCON ’98: Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research. IBM Press, 1998, p. 5.

    Google Scholar 

  16. A. Rajasekar, M. Wan, R. Moore, and W. Schroeder, “A Prototype Rule-based Distributed Data Management System,” in HPDC workshop on ”Next Generation Distributed Data Management”, May 2006.

    Google Scholar 

  17. M. Ernst, P. Fuhrmann, M. Gasthuber, T. Mkrtchyan, and C. Waldman, “dCache, a Distributed Storage Data Caching System,” in Proceedings of the International CHEP 2001, Beijing, China, September 2001.

    Google Scholar 

  18. G. Lo Presti, O. Barring, A. Earl, R. Garcia Rioja, S. Ponce, G. Taurelli, D. Waldron, and M. Coelho Dos Santos, “CASTOR: A Distributed Storage Resource Facility for High Performance Data Processing at CERN,” 24th IEEE Conference on Mass Storage Systems and Technologies, 2007. MSST 2007., pp. 275–280, September 2007.

    Google Scholar 

  19. F. Donno, A. Ghiselli, L. Magnoni, and R. Zappi, “StoRM: GRID middleware for disk resource management,” in Proceedings of the International CHEP 2004, Interlaken, Switzerland, October 2004.

    Google Scholar 

  20. A. Shoshani, A. Sim, and J. Gu, “Storage Resource Managers,” in Grid Resource Management: State of the Art and Future Trends, 1st ed. Springer, 2003, ch. 20, pp. 321–340.

    Google Scholar 

  21. O. Tatebe, Y. Morita, S. Matsuoka, N. Soda, and S. Sekiguchi, “Grid Datafarm Architecture for Petascale Data Intensive Computing,” Cluster Computing and the Grid, 2002. 2nd IEEE/ACM International Symposium on, May 2002.

    Google Scholar 

  22. The xrootd team. (2009, January) The Scalla Software Suite: xrootd/cmsd. [Online]. Available: http://xrootd.slac.stanford.edu/

  23. Lawrence Berkeley National Laboratory Scientific Data Management Research Group. (2009, January) BeStMan. [Online]. Available: http://datagrid.lbl.gov/bestman/

  24. L. Abadie, P. Badino, J.-P. Baud, J. Casey, A. Frohner, G. Grosdidier, S. Lemaitre, G. Mccance, R. Mollon, K. Nienartowicz, D. Smith, and P. Tedesco, “Grid–Enabled Standards–based Data Management,” 24th IEEE Conference on Mass Storage Systems and Technologies, 2007. MSST 2007., pp. 60–71, Sept. 2007.

    Google Scholar 

  25. J. Postel and J. Reynolds, “File transfer protocol (ftp),” The Internet Engineering Task Force, Tech. Rep. RFC 959, October 1985.

    Google Scholar 

  26. M. Horowitz, C. Solutions, and S. Lunt, “Ftp security extensions,” The Internet Engineering Task Force, Tech. Rep. RFC 2228, October 1997.

    Google Scholar 

  27. T. Ylonen and SSH Communications Security Corp and C. Lonvick and Cisco Systems Inc., “The secure shell (ssh) authentication protocol,” The Internet Engineering Task Force, Tech. Rep. RFC 4252, January 2006.

    Google Scholar 

  28. dCache.org. (2009, January) The dCache Book. [Online]. Available: http://www.dcache.org/manuals/Book/

  29. G. V. Laszewski, I. Foster, J. Gawor, P. Lane, N. Rehn, and M. Russell, “A java commodity grid kit,” Concurrency and Computation: Practice and Experience, vol. 13, Issues 8, pp. 645–662, 2001.

    Google Scholar 

  30. I. Foster, “Globus Toolkit Version 4: Software for Service-Oriented Systems,” in IFIP International Conference on Network and Parallel Computing, Springer-Verlag LNCS 3779, 2005, pp. 2–13.

    Google Scholar 

  31. C. Bauer and G. King, Java Persistence with Hibernate. Greenwich, CT, USA: Manning Publications Co., 2006.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Sutter .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this paper

Cite this paper

Sutter, M. et al. (2012). File Systems and Access Technologies for the Large Scale Data Facility. In: Davoli, F., Lawenda, M., Meyer, N., Pugliese, R., Węglarz, J., Zappatore, S. (eds) Remote Instrumentation for eScience and Related Aspects. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-0508-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-0508-5_16

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-0507-8

  • Online ISBN: 978-1-4614-0508-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics