Advertisement

A High-Throughput Kalman Filter for Modern SIMD Architectures

  • Daniel Hugo Cámpora Pérez
  • Omar Awile
  • Cédric Potterat
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10659)

Abstract

The Kalman filter is a critical component of the reconstruction process of subatomic particle collision in high-energy physics detectors. At the LHCb detector in the Large Hadron Collider this reconstruction must be performed at an average rate of 30 million times per second. As a consequence of the ever-increasing collision rate and upcoming detector upgrades, the data rate that needs to be processed in real time is expected to increase by a factor of 40 in the next five years. In order to keep pace, processing and filtering software must take advantage of latest developments in hardware technology.

In this paper we present a cross-architecture SIMD parallel algorithm and implementation of a low-rank Kalman filter. We integrate our implementation in production code and validate the numerical results in the context of physics reconstruction. We also compare its throughput across modern multi- and many-core architectures.

Using our Kalman filter implementation we are able to achieve a sustained throughput of 75 million particle hit reconstructions per second on an Intel Xeon Phi Knights Landing platform, a factor 6.81 over the current production implementation running on a two-socket Haswell system. Additionally we show that under the constraints of our Kalman filter formulation we efficiently use the available hardware resources.

Our implementation will allow us to better sustain the required throughput of the detector in the coming years and scale to future hardware architectures. Additionally our work enables the evaluation of other computing platforms for future hardware upgrades.

Keywords

Kalman filter Data-intensive parallel algorithms Numerical methods 

Notes

Acknowledgements

The authors would like to thank the High-Throughput Computing Collaboration at CERN openlab for fruitful discussions through the process of designing and writing the presented software, and early access to Intel hardware. Thanks to F. Lemaitre for his contribution of the vectorized transposition code, and to O. Bouizi and S. Harald for the low-level code discussions and for providing early results and insight on the Xeon Phi architecture. In addition, thanks to W. Hulsbergen and R. Aaij for the mathematical discussions and data structure design. Finally, thanks to N. Neufeld and A. Riscos Núñez for their guidance and support.

References

  1. 1.
    The LHCb Collaboration: framework TDR for the LHCb upgrade: technical design report. Technical report CERN-LHCC-2012-007. LHCb-TDR-12, April 2012. https://cds.cern.ch/record/1443882
  2. 2.
    The LHCb Collaboration: LHCb trigger and online upgrade technical design report. Technical report CERN-LHCC-2014-016. LHCB-TDR-016, May 2014. https://cds.cern.ch/record/1701361
  3. 3.
    Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960).  https://doi.org/10.1115/1.3662552 CrossRefGoogle Scholar
  4. 4.
    Mcgee, L.A., Schmidt, S.F.: Discovery of the Kalman filter as a practical tool for aerospace and industry. Technical report, November 1985. https://ntrs.nasa.gov/search.jsp?R=19860003843
  5. 5.
    Houtekamer, P.L., Mitchell, H.L.: Data assimilation using an ensemble Kalman filter technique. Mon. Weather Rev. 126(3), 796–811 (1998). http://journals.ametsoc.org/doi/abs/10.1175/1520-0493%281998%29126%3C0796%3ADAUAEK%3E2.0.CO%3B2 CrossRefGoogle Scholar
  6. 6.
    Welch, G., Bishop, G.: An introduction to the Kalman filter. Technical report, Chapel Hill, NC, USA (1995)Google Scholar
  7. 7.
    Hulsbergen, W.: The global covariance matrix of tracks fitted with a Kalman filter and an application in detector alignment. Nucl. Instrum. Methods Phys. Res. Sec. A: Accel. Spectrom. Detect. Assoc. Equip. 600(2), 471–477 (2009). http://www.sciencedirect.com/science/article/pii/S0168900208017567 CrossRefGoogle Scholar
  8. 8.
    Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the 18–20 April 1967, Spring Joint Computer Conference, pp. 483–485. AFIPS 1967 (Spring). ACM, New York (1967). http://doi.acm.org/10.1145/1465482.1465560
  9. 9.
    Cámpora Pérez, D.H.: LHCb Kalman filter cross-architecture studies (2016)Google Scholar
  10. 10.
    Cerati, G., Elmer, P., Lantz, S., McDermott, K., Riley, D., Tadel, M., Wittich, P., Würthwein, F., Yagil, A.: Kalman filter tracking on parallel architectures. J. Phy. Conf. Series 664(7), 072008 (2015). http://stacks.iop.org/1742-6596/664/i=7/a=072008 CrossRefGoogle Scholar
  11. 11.
    Aaij, R., Fontana, M., Le Gac, R., Zacharjasz, E.A., Schwemmer, R., Fitzpatrick, C., Albrecht, J., Grillo, L., Szumlak, T., Yin, H., Couturier, B., Stahl, S., Williams, M.R.J., Vries, D., Andreas, J., Seyfert, P., Wanczyk, J., Esen, S., Neufeld, N., Hasse, C., Vesterinen, M.A., Nikodem, T., Quagliani, R., Polci, F., Dziurda, A., Jones, C.R., Matev, R., De Cian, M., Del Buono, L.: Upgrade trigger: biannual performance update. Technical report, February 2017. https://cds.cern.ch/record/2244312
  12. 12.
    Mertens, S.: The easiest hard problem: number partitioning, October 2003. arXiv:cond-mat/0310317
  13. 13.
    Gou, C., Kuzmanov, G., Gaydadjiev, G.N.: SAMS multi-layout memory: providing multiple views of data to boost SIMD performance. In: Proceedings of the 24th ACM International Conference on Supercomputing, pp. 179–188. ICS 2010. ACM, New York (2010). http://doi.acm.org/10.1145/1810085.1810111
  14. 14.
    Fog, A.: VCL C++ vector class library (2012). http://www.agner.org/optimize
  15. 15.
    Karpiński, P., McDonald, J.: A high-performance portable abstract interface for explicit SIMD vectorization. In: Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2017, pp. 21–28. ACM, New York (2017). http://doi.acm.org/10.1145/3026937.3026939
  16. 16.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65 (2009)CrossRefGoogle Scholar
  17. 17.
    Schiller, M.: Track reconstruction and prompt \(K^0_S\) production at the LHCb experiment. Dissertation, University of Heidelberg (2011)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.CERNGenevaSwitzerland
  2. 2.Universidad de SevillaSevillaSpain
  3. 3.Universidade Federal do Rio de Janeiro (UFRJ)Rio de JaneiroBrazil

Personalised recommendations