Skip to main content

Big Data Management: A Case Study on Medical Data

  • Conference paper
  • First Online:
  • 840 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11878))

Abstract

The paper introduces an approach for scalable data management in the context of Big Data. The main objective of the study is to design and implement a metadata model and a data catalog solution based on emerging Big Data technologies. The solution is scalable and integrates the following components: (1) the data sources; (2) a file scanner; (3) the metadata storage and processing component; and (4) a visualization component. The approach and its underlying metadata model are demonstrated with a toy use case from the medical domain, and can be easily adapted and extended to other use cases and requirements.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Edjlali, R., Duncan, A.D., De Simoni, G., Zaidi, E.: Data Catalogs Are the New Black in Data Management and Analytics. Gartner Research (2017)

    Google Scholar 

  2. Bullivant, R.: Data Catalogues Might be the New Black, But Metadata Discovery to Provision Them Can be Tricky. Silwood (2018)

    Google Scholar 

  3. Wells, D: The Ultimate Guide to Data Catalogs, Key Things to Consider When Selecting a Data Catalog. Eckerson Group (2018)

    Google Scholar 

  4. Corporate Data Quality (CDQ), Data Catalog. https://www.cc-cdq.ch/data-catalogs. Accessed 31 Jul 2019

  5. Bieh-Zimmert, O., Engel, M., Kraus, S.: Cataloging Data. A capability maturity model for data catalogs, Deloitte Analytics Institute, Whitepaper (2018)

    Google Scholar 

  6. Collibra. https://www.collibra.com/. Accessed 31 Jul 2019

  7. Goetz, M., Leganza, G., Hoberman, E., Hartig, K.: The Forrester WaveTM: Machine Learning Data Catalogs, Q2 2018 (2018)

    Google Scholar 

  8. Collibra, 5 Things Your Data Catalog Needs (But Doesn’t Have). https://www.collibra.com/blog/5-things-data-catalog-needs-doesnt/. Accessed 31 Jul 2019

  9. Brown, A.: Data Catalogs and the Maturation of the Machine Learning Market (2018)

    Google Scholar 

  10. Pathak, G.: A Big Metadata Problem, Metadata Management that Scales: Dealing with Big Metadata (2017)

    Google Scholar 

  11. Stanford Libraries, Creating Metadata. https://library.stanford.edu/research/data-management-services/data-best-practices/creating-metadata. Accessed 31 Jul 2019

  12. Mosely, M.: Metadata Subject Areas (2010)

    Google Scholar 

  13. Knight, M.: To Drive Business Success Implement a Data Catalog and Data Inventory (2018)

    Google Scholar 

  14. Data Catalog, Google Cloud. https://cloud.google.com/data-catalog/. Accessed 31 Jul 2019

  15. AWS, Informatica Enterprise Data Catalog on AWS. https://aws.amazon.com/quickstart/architecture/informatica-eic/. Accessed 31 Jul 2019

  16. BridgeHead. https://www.bridgeheadsoftware.com/healthcare-data-management-hdm/. Accessed 31 Jul 2019

  17. Ciuciu, I., Ene, A.B., Lazar, C.: An ICT project case study from education: a technology review for a data engineering pipeline. In: Proceedings of BIS 2019, Seville, Spain (2019)

    Google Scholar 

  18. Apache Sqoop. https://sqoop.apache.org/. Accessed 31 Jul 2019

  19. Apache Hadoop. https://hadoop.apache.org/. Accessed 31 Jul 2019

  20. Apache Hive TM. https://hive.apache.org/. Accessed 31 Jul 2019

  21. MongoDB. https://www.mongodb.com/. Accessed 31 Jul 2019

  22. Tableau. https://www.tableau.com/. Accessed 31 Jul 2019

  23. Apache Oozie. https://oozie.apache.org/. Accessed 31 Jul 2019

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ioana Ciuciu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sulea, V., Ciuciu, I. (2020). Big Data Management: A Case Study on Medical Data. In: Debruyne, C., et al. On the Move to Meaningful Internet Systems: OTM 2019 Workshops. OTM 2019. Lecture Notes in Computer Science(), vol 11878. Springer, Cham. https://doi.org/10.1007/978-3-030-40907-4_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-40907-4_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-40906-7

  • Online ISBN: 978-3-030-40907-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics