Abstract
The paper introduces an approach for scalable data management in the context of Big Data. The main objective of the study is to design and implement a metadata model and a data catalog solution based on emerging Big Data technologies. The solution is scalable and integrates the following components: (1) the data sources; (2) a file scanner; (3) the metadata storage and processing component; and (4) a visualization component. The approach and its underlying metadata model are demonstrated with a toy use case from the medical domain, and can be easily adapted and extended to other use cases and requirements.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Edjlali, R., Duncan, A.D., De Simoni, G., Zaidi, E.: Data Catalogs Are the New Black in Data Management and Analytics. Gartner Research (2017)
Bullivant, R.: Data Catalogues Might be the New Black, But Metadata Discovery to Provision Them Can be Tricky. Silwood (2018)
Wells, D: The Ultimate Guide to Data Catalogs, Key Things to Consider When Selecting a Data Catalog. Eckerson Group (2018)
Corporate Data Quality (CDQ), Data Catalog. https://www.cc-cdq.ch/data-catalogs. Accessed 31 Jul 2019
Bieh-Zimmert, O., Engel, M., Kraus, S.: Cataloging Data. A capability maturity model for data catalogs, Deloitte Analytics Institute, Whitepaper (2018)
Collibra. https://www.collibra.com/. Accessed 31 Jul 2019
Goetz, M., Leganza, G., Hoberman, E., Hartig, K.: The Forrester WaveTM: Machine Learning Data Catalogs, Q2 2018 (2018)
Collibra, 5 Things Your Data Catalog Needs (But Doesn’t Have). https://www.collibra.com/blog/5-things-data-catalog-needs-doesnt/. Accessed 31 Jul 2019
Brown, A.: Data Catalogs and the Maturation of the Machine Learning Market (2018)
Pathak, G.: A Big Metadata Problem, Metadata Management that Scales: Dealing with Big Metadata (2017)
Stanford Libraries, Creating Metadata. https://library.stanford.edu/research/data-management-services/data-best-practices/creating-metadata. Accessed 31 Jul 2019
Mosely, M.: Metadata Subject Areas (2010)
Knight, M.: To Drive Business Success Implement a Data Catalog and Data Inventory (2018)
Data Catalog, Google Cloud. https://cloud.google.com/data-catalog/. Accessed 31 Jul 2019
AWS, Informatica Enterprise Data Catalog on AWS. https://aws.amazon.com/quickstart/architecture/informatica-eic/. Accessed 31 Jul 2019
BridgeHead. https://www.bridgeheadsoftware.com/healthcare-data-management-hdm/. Accessed 31 Jul 2019
Ciuciu, I., Ene, A.B., Lazar, C.: An ICT project case study from education: a technology review for a data engineering pipeline. In: Proceedings of BIS 2019, Seville, Spain (2019)
Apache Sqoop. https://sqoop.apache.org/. Accessed 31 Jul 2019
Apache Hadoop. https://hadoop.apache.org/. Accessed 31 Jul 2019
Apache Hive TM. https://hive.apache.org/. Accessed 31 Jul 2019
MongoDB. https://www.mongodb.com/. Accessed 31 Jul 2019
Tableau. https://www.tableau.com/. Accessed 31 Jul 2019
Apache Oozie. https://oozie.apache.org/. Accessed 31 Jul 2019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sulea, V., Ciuciu, I. (2020). Big Data Management: A Case Study on Medical Data. In: Debruyne, C., et al. On the Move to Meaningful Internet Systems: OTM 2019 Workshops. OTM 2019. Lecture Notes in Computer Science(), vol 11878. Springer, Cham. https://doi.org/10.1007/978-3-030-40907-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-40907-4_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-40906-7
Online ISBN: 978-3-030-40907-4
eBook Packages: Computer ScienceComputer Science (R0)