Extension of the Identity Management System Mainzelliste to Reduce Runtimes for Patient Registration in Large Datasets

  • Norman ZerbeEmail author
  • Christopher Hampf
  • Peter Hufnagl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12090)


Identity management is a central component of medical research, the management of medical samples and related biomaterial data and data protection requirements. For daily use, it is important to ensure that an identity management system is able to manage large datasets with several million records within a feasible time. The Central Biomaterial Bank Charité (ZeBanC) aimed to use Mainzelliste, for the purpose of externalization of the identity management of a running biobank system. The evaluation results showed that is was not possible to register new patients into a database with several hundred thousand datasets in feasible runtimes.

The aims of this project were an evaluation and optimization of the performance within an increasing dataset and the reduction of runtimes of patient registration in the Mainzelliste environment without negative impact on the accuracy of record linkage. The longest runtimes were identified and optimized so that only those data which are required during registration are loaded. To speed up record linkage, parts of the algorithm were optimized. The initial record linkage, with an extensive runtime also compared patients, which are completely different and are definitely not duplicates. Moreover, a pre-matcher was included which compares two patients based on their hashes before a detailed comparison based on every attribute is started.

All implemented optimizations have a positive impact on runtimes without decreasing the accuracy. The optimizations described in this paper have been integrated into the official repository of Mainzelliste and are available to the community.


Biobank Identity management Pseudonymization Performace optimization Nilsimsa Patient matching 



We like to thank Mr. Andreas Borg for his input regarding certain implementation details. We also thank Dr. Martin Lablans for supporting this optimization.


  1. 1.
    Pommerening, K., Helbing, K., Ganslandt, T., Drepper, J.: Lecture Notes in Informatics. Bonn: Gesellschaft für Informatik; c2012. Chapter, Identitätsmanagement für Patienten in medizinischen Forschungsverbünden, p. 1520–1529Google Scholar
  2. 2.
    Bialke, M., et al.: MOSAIC - a modular approach to data management in epidemiological studies. Methods Inf. Med. 54(4), 364–371 (2015). Scholar
  3. 3.
    Bialke, M., et al.: A workflow-driven approach to integrate generic software modules in a Trusted Third Party. J. Transl. Med. 13(176), 1 (2015). Scholar
  5. 5.
    Lablans, M., Borg, A.: Mainzelliste. Accessed Nov 2019
  6. 6.
    IT-Reviewing-Board der TMF Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V.: IT-Infrastruktur in der patientenorientierten Forschung. Akademische Verlagsgesellschaft AKA GmbH (2014)Google Scholar
  7. 7. OSSE - Open Source Registry System for Rare Diseases in the EU; 2015. Johannes Gutenberg-Universität Mainz. Accessed Nov 2019
  8. 8.
    Lablans, M., Borg, A., Ückert, F.: A RESTful interface to pseudonymization services in modern web applications. BMC Med. Inform. Decis. Mak. 15, 2 (2015). Scholar
  9. 9.
  10. 10.
  11. 11.
    Pommerening, K., Faldum, A.: Ein Algorithmus zur Erzeugung von pseudonymen Identifikatoren. Institut für Medizinische Statistik und Dokumentation der Johannes-Gutenberg-Universität Mainz. (2001). Accessed Nov 2019
  12. 12.
    Contiero, P., Tittarelli, A., Tagliabue, G., Maghini, A., Fabiano, S., Crosignani, P., et al.: The EpiLink record linkage software presentation and results of linkage test on cancer registry files. Methods Inf. Med. 44(1), 66–71 (2005)CrossRefGoogle Scholar
  13. 13.
    Toussi, F.: Chapter 1. Running and Using Hsqldb - Hsqldb Server; 2002–2019. Accessed Nov 2019
  14. 14.
    Turner, S., Chen, L.: Updated Security Considerations for the MD5 Message-Digest and the HMAC-MD5 Algorithms. (2011). RFC Editor Accessed Nov 2019
  15. 15.
    Eastlake, D., Hansen, T.: US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF). (2011). RFC Editor. Accessed Nov 2019
  16. 16.
    Stein, B., Potthash, M.: Hashing-basierte Indizierung: Anwendungsszenarien, Theorie und Methoden. In: Proceedings 14th Workshop on Adaptivity and User Modeling in Interactive Systems (ABIS 2006), pp. 159–166. Hildesheim (2006)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Norman Zerbe
    • 1
    • 2
    Email author
  • Christopher Hampf
    • 2
  • Peter Hufnagl
    • 1
    • 2
    • 3
  1. 1.Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of PathologyBerlinGermany
  2. 2.Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Central Biobank CharitéBerlinGermany
  3. 3.HTW University of Applied Sciences Berlin, Center for Biomedical Image and Information Processing (CBMI)BerlinGermany

Personalised recommendations