Abstract
With the development of the new generation of High Energy Physics (HEP) experiments, huge amounts of data are being generated. Efficient parallel algorithms/frameworks and High IO throughput are key to meet the scalability and performance requirements of HEP offline data analysis. Though Hadoop has gained a lot of attention from scientific community for its scalability and parallel computing framework for large data sets, it’s still difficult to make HEP data processing tasks run directly on Hadoop. In this paper we investigate the application of Hadoop to make HEP jobs run on it transparently. Particularly, we discuss a new mechanism to support HEP software to random access data in HDFS. Because HDFS is streaming data stored only supporting sequential write and append. It cannot satisfy HEP jobs to random access data. This new feature allows the Map/Reduce tasks to random read/write on the local file system on data nodes instead of using Hadoop data streaming interface. This makes HEP jobs run on Hadoop possible. We also develop diverse MapReduce model for HEP jobs such as Corsika simulation, ARGO detector simulation and Medea++ reconstruction. And we develop a toolkit for users to submit/query/remove jobs. In addition, we provide cluster monitoring and account system to benefit to the system availability. This work has been in production for HEP experiment to gain about 40,000 CPU hours per month since September, 2016.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
BESIII Collaboration: The construction of the BESIII experiment. Nucl. Instrum. Methods Phys. Res. Sect. A: Accel. Spectrom. Detect. Assoc. Equip. 598(1), 7–11 (2009)
Cao, J., Luk, K.: An overview of the daya bay reactor neutrino experiment. High Energy Physics - Experiment (hep-ex) (2016)
HXMT-Hard X-Ray Modulation Telescope. http://spaceflight101.com/spacecraft/hxmt/
Sciascio, G., et al.: The LHAASO experiment: from gamma-ray astronomy to cosmic rays. In: CRIS 2015 Conference [hep-ex] (2015)
Jiangmen Underground Neutrino Observatory (JUNO). http://juno.ihep.cas.cn/
HTCondor. https://research.cs.wisc.edu/htcondor/
Wang, F., Oral. S., Shipman, G., et al.: Understanding Lustre filesystem internals
Peters, A.J., Sindrilaru, E.A., Adde, G.: EOS as the present and future solution for data storage at CERN. J. Phys: Conf. Ser. 664, 042042 (2015)
Hadoop. http://hadoop.apache.org/
Shvachko, K., et al.: The hadoop distributed file system. In: Proceedings of IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10 (2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Li, W., Shi, J., et al.: Off-line computing of high energy physics experiments. Mod. Phys. 28(3), 38–45 (2016)
Barrand, G., et al.: GAUDI-A software architecture and framework for building HEP data processing applications. Comput. Phys. Commun. 140, 45–55 (2001)
Zou, J.H., et al.: SNiPER: an offline software framework for non-collider physics experiments. J. Phys: Conf. Ser. 664, 072053 (2015)
Brun, R., Rademakers, F.: ROOT—an object oriented data analysis framework. Nucl. Instrum. Methods Phys. Res. Sect. A: Accel. Spectrom. Detect. Assoc. Equip. 389(1), 81–86 (1997)
Acknowledgment
This work was supported by the National key Research Program of China “Scientific Big Data Management System” (No. 2016YFB1000605) and was supported by the National Natural Science Foundation of China (NSFC) “Research on the Key Technologies of Cloud Federation for High Energy Physics Experiments” under Contracts No. 11875283.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, Q., Wei, Z., Sun, G., Cheng, Y., Cheng, Z., Hu, Q. (2019). Using Hadoop for High Energy Physics Data Analysis. In: Li, J., Meng, X., Zhang, Y., Cui, W., Du, Z. (eds) Big Scientific Data Management. BigSDM 2018. Lecture Notes in Computer Science(), vol 11473. Springer, Cham. https://doi.org/10.1007/978-3-030-28061-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-28061-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28060-4
Online ISBN: 978-3-030-28061-1
eBook Packages: Computer ScienceComputer Science (R0)