The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

  • Valentin KuznetsovEmail author
  • Nils Leif Fischer
  • Yuyi Guo
Original Article


The CMS experiment at the CERN LHC developed the workflow management archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate \(\mathcal {O}\)(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.


BigData LHC Data management 



We would like to thank our colleagues Seangchan Ryu (FNAL) and Alan Malta (Univ. of Nebraska) for numerous feedback and guidance across various details of CMS Workflow Management System. Special thanks goes to Eric Vaandering (FNAL) for initiating the idea of WMArchive system in CMS and his constant support along development cycle. We also would like to thank Luca Menichetti from CERN IT who provided support for development, maintenance and deployment of our scripts on Spark platform.


  1. 1.
    Adelman J et al (2014) CMS computing operations during run 1. J Phys Conf Ser 513:032040CrossRefGoogle Scholar
  2. 2.
    Giffels M, Guo Y, Kuznetsov V, Magini N, Wildish T (2014) The CMS data management system. J Phys Conf Ser 513:042052. CrossRefGoogle Scholar
  3. 3.
    Evans D, et al. The CMS workload management system. J Phys Conf Ser 396(3)Google Scholar
  4. 4.
    Balcas J, Belforte S, Bockelman B, Colling D, Gutsche O, Hufnagel D, Khan F, Larson K, Letts J, Mascheroni M (2015) Using the glideinWMS system as a common resource provisioning layer in CMS. J Phys Conf Ser 664: 062031. Scholar
  5. 5.
    Thain D, Tannenbaum T, Livny M (2005) Distributed computing in practice: the Condor experience. Concurr Comput Pract Exp 17(2–4):323–356. CrossRefGoogle Scholar
  6. 6.
    Sfiligoi I, Bradley DC, Holzman B, Mhashilkar P, Padhi S, Wurthwein F (2009) The pilot way to grid resources using glideinWMS. In: Proceedings of the 2009 WRI world congress on computer science and information engineering, vol 02 CSIE 09, Washington, DC, USA: IEEE Computer Society, pp 428432. ISBN 978-0-7695-3507-4. Scholar
  7. 7.
    MongoDB document-oriented database. Scholar
  8. 8.
    MongoDB indices. Scholar
  9. 9.
    MongoDB Query Language. Scholar
  10. 10.
    Apache Avro. Scholar
  11. 11.
    WMArchive UI dependencies. Scholar
  12. 12.
    Kibana. Scholar
  13. 13.
    Grafana. https://grafana.comGoogle Scholar
  14. 14.
    Apache Spark, Scholar
  15. 15.
    Spiga D, Lacaprara S, Bacchi M, Cinquilli M, Codispoti G, Corvo M, Dorigo A, Fanfani A, Fanzago F, Farina F, Merlo M, Gutsche O, Servoli L, Kavka C (2007) The CMS remote analysis builder (CRAB). High Perform Comput HiPC 2007 2007:580–586. CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Cornell UniversityIthacaUSA
  2. 2.Heidelberg UniversityHeidelbergGermany
  3. 3.Fermilab National LaboratoryWinfield TownshipUSA

Personalised recommendations