Skip to main content

Using Hadoop for High Energy Physics Data Analysis

  • Conference paper
  • First Online:
Big Scientific Data Management (BigSDM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11473))

Included in the following conference series:

  • 767 Accesses

Abstract

With the development of the new generation of High Energy Physics (HEP) experiments, huge amounts of data are being generated. Efficient parallel algorithms/frameworks and High IO throughput are key to meet the scalability and performance requirements of HEP offline data analysis. Though Hadoop has gained a lot of attention from scientific community for its scalability and parallel computing framework for large data sets, it’s still difficult to make HEP data processing tasks run directly on Hadoop. In this paper we investigate the application of Hadoop to make HEP jobs run on it transparently. Particularly, we discuss a new mechanism to support HEP software to random access data in HDFS. Because HDFS is streaming data stored only supporting sequential write and append. It cannot satisfy HEP jobs to random access data. This new feature allows the Map/Reduce tasks to random read/write on the local file system on data nodes instead of using Hadoop data streaming interface. This makes HEP jobs run on Hadoop possible. We also develop diverse MapReduce model for HEP jobs such as Corsika simulation, ARGO detector simulation and Medea++ reconstruction. And we develop a toolkit for users to submit/query/remove jobs. In addition, we provide cluster monitoring and account system to benefit to the system availability. This work has been in production for HEP experiment to gain about 40,000 CPU hours per month since September, 2016.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. BESIII Collaboration: The construction of the BESIII experiment. Nucl. Instrum. Methods Phys. Res. Sect. A: Accel. Spectrom. Detect. Assoc. Equip. 598(1), 7–11 (2009)

    Google Scholar 

  2. Cao, J., Luk, K.: An overview of the daya bay reactor neutrino experiment. High Energy Physics - Experiment (hep-ex) (2016)

    Google Scholar 

  3. HXMT-Hard X-Ray Modulation Telescope. http://spaceflight101.com/spacecraft/hxmt/

  4. Sciascio, G., et al.: The LHAASO experiment: from gamma-ray astronomy to cosmic rays. In: CRIS 2015 Conference [hep-ex] (2015)

    Google Scholar 

  5. Jiangmen Underground Neutrino Observatory (JUNO). http://juno.ihep.cas.cn/

  6. HTCondor. https://research.cs.wisc.edu/htcondor/

  7. Wang, F., Oral. S., Shipman, G., et al.: Understanding Lustre filesystem internals

    Google Scholar 

  8. Peters, A.J., Sindrilaru, E.A., Adde, G.: EOS as the present and future solution for data storage at CERN. J. Phys: Conf. Ser. 664, 042042 (2015)

    Google Scholar 

  9. Hadoop. http://hadoop.apache.org/

  10. Shvachko, K., et al.: The hadoop distributed file system. In: Proceedings of IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10 (2010)

    Google Scholar 

  11. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  12. Li, W., Shi, J., et al.: Off-line computing of high energy physics experiments. Mod. Phys. 28(3), 38–45 (2016)

    MathSciNet  Google Scholar 

  13. Barrand, G., et al.: GAUDI-A software architecture and framework for building HEP data processing applications. Comput. Phys. Commun. 140, 45–55 (2001)

    Article  Google Scholar 

  14. Zou, J.H., et al.: SNiPER: an offline software framework for non-collider physics experiments. J. Phys: Conf. Ser. 664, 072053 (2015)

    Google Scholar 

  15. Brun, R., Rademakers, F.: ROOT—an object oriented data analysis framework. Nucl. Instrum. Methods Phys. Res. Sect. A: Accel. Spectrom. Detect. Assoc. Equip. 389(1), 81–86 (1997)

    Article  Google Scholar 

Download references

Acknowledgment

This work was supported by the National key Research Program of China “Scientific Big Data Management System” (No. 2016YFB1000605) and was supported by the National Natural Science Foundation of China (NSFC) “Research on the Key Technologies of Cloud Federation for High Energy Physics Experiments” under Contracts No. 11875283.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiulan Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, Q., Wei, Z., Sun, G., Cheng, Y., Cheng, Z., Hu, Q. (2019). Using Hadoop for High Energy Physics Data Analysis. In: Li, J., Meng, X., Zhang, Y., Cui, W., Du, Z. (eds) Big Scientific Data Management. BigSDM 2018. Lecture Notes in Computer Science(), vol 11473. Springer, Cham. https://doi.org/10.1007/978-3-030-28061-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28061-1_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28060-4

  • Online ISBN: 978-3-030-28061-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics