Skip to main content

Provenance Supporting Hyperparameter Analysis in Deep Neural Networks

  • Conference paper
  • First Online:
Provenance and Annotation of Data and Processes (IPAW 2020, IPAW 2021)

Abstract

The duration of the life cycle in deep neural networks (DNN) depends on the data configuration decisions that lead to success in obtaining models. Analyzing hyperparameters along the evolution of the network’s execution allows for adapting the data. Provenance data derivation traces help the parameter fine-tuning by providing a global data picture with clear dependencies. Provenance can also contribute to the interpretation of models resulting from the DNN life cycle. However, there are challenges in collecting hyperparameters and in modeling the relationships between the data involved in the DNN life cycle to build a provenance database. Current approaches adopt different notions of provenance in their representation and require the execution of the DNN under a specific software framework, which limits interoperability and flexibility when choosing the DNN execution environment. This work presents a provenance data-based approach to address these challenges, proposing a collection mechanism with flexibility in the choice and representation of data to be analyzed. Experiments of the approach, using a convolutional neural network focused on image recognition, provide evidence of the flexibility, the efficiency of data collection, the analysis and the validation of network data.

This work is funded by CNPq, FAPERJ, and Inria (HPDaSc associated team). D. Pina and L. Kunstmann are supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://prov.readthedocs.io/en/latest/prov.html.

  2. 2.

    http://www.graphviz.org/.

  3. 3.

    https://keras.io/.

  4. 4.

    https://github.com/keras-team/keras.

  5. 5.

    https://www.elastic.co/kibana.

References

  1. Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)

  2. Agrawal, P., et al.: Data platform for machine learning. In: Proceedings of the 2019 International Conference on Management of Data, pp. 1803–1816 (2019)

    Google Scholar 

  3. Almeida, R.F., et al.: Managing data provenance for bioinformatics workflows using AProvBio. Int. J. Comput. Biol. Drug Des. 12(2), 153–170 (2019). https://doi.org/10.1504/IJCBDD.2019.099761

    Article  Google Scholar 

  4. Cheney, J., Chapman, A., Davidson, J., Forbes, A.: Data provenance, curation and quality in metrology. arXiv preprint arXiv:2102.08228 (2021)

  5. Corrigan, D., Curcin, V., Ethier, J., Flynn, A.J., Sottara, D.: Challenges of deploying computable biomedical knowledge in real-world applications. In: AMIA 2019, American Medical Informatics Association Annual Symposium, Washington, DC, USA, 16–20 November 2019. AMIA (2019), http://knowledge.amia.org/69862-amia-1.4570936/t002-1.4575206/t002-1.4575207/3201770-1.4575319/3203261-1.4575316

  6. Davison, A.: Automated capture of experiment context for easier reproducibility in computational research. Comput. Sci. Eng. 14(4), 48–56 (2012)

    Article  Google Scholar 

  7. Fairweather, E., Wittner, R., Chapman, M., Holub, P., Curcin, V.: Non-repudiable provenance for clinical decision support systems. CoRR abs/2006.11233 (2020). https://arxiv.org/abs/2006.11233

  8. Fekete, J., Freire, J., Rhyne, T.: Exploring reproducibility in visualization. IEEE Comput. Graph. Appl. 40(5), 108–119 (2020). https://doi.org/10.1109/MCG.2020.3006412

  9. Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10(3), 11–21 (2008)

    Article  Google Scholar 

  10. Gehani, A., Tariq, D.: SPADE: support for provenance auditing in distributed environments. In: Narasimhan, P., Triantafillou, P. (eds.) Middleware 2012. LNCS, vol. 7662, pp. 101–120. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35170-9_6

    Chapter  Google Scholar 

  11. Gharibi, G., Walunj, V., Alanazi, R., Rella, S., Lee, Y.: Automated management of deep learning experiments. In: Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning, p. 8. ACM (2019)

    Google Scholar 

  12. Gharibi, G., Walunj, V., Rella, S., Lee, Y.: ModelKB: towards automated management of the modeling lifecycle in deep learning. In: Proceedings of the 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, pp. 28–34. IEEE Press (2019)

    Google Scholar 

  13. Ghoshal, D., Plale, B.: Provenance from log files: a bigdata problem. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, pp. 290–297 (2013)

    Google Scholar 

  14. Gil, Y., et al.: Artificial intelligence for modeling complex systems: taming the complexity of expert models to improve decision making. ACM Trans. Interact. Intell. Syst. (2021)

    Google Scholar 

  15. Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT press Cambridge (2016)

    Google Scholar 

  16. Gurnani, A., Mavani, V., Gajjar, V., Khandhediya, Y.: Flower categorization using deep convolutional neural networks. arXiv preprint arXiv:1708.03763 (2017)

  17. Herschel, M., Diestelkämper, R., Lahmar, H.B.: A survey on provenance: what for? what form? what from? VLDB J. 26(6), 881–906 (2017)

    Article  Google Scholar 

  18. Huynh, T.D., Stalla, S., Moreau, L.: Provenance-based explanations for automated decisions: final IAA project report (2019)

    Google Scholar 

  19. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  20. Lourenço, R., Freire, J., Shasha, D.: Debugging machine learning pipelines. In: Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning, pp. 1–10 (2019)

    Google Scholar 

  21. Mattoso, M., et al.: Dynamic steering of HPC scientific workflows: a survey. Future Gener. Comput. Syst. 46, 100–113 (2015)

    Article  Google Scholar 

  22. McPhillips, T., Bowers, S., Belhajjame, K., Ludäscher, B.: Retrospective provenance without a runtime provenance recorder. In: 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2015) (2015)

    Google Scholar 

  23. Miao, H., Li, A., Davis, L.S., Deshpande, A.: ModelHUB: lifecycle management for deep learning. Univ. of Maryland (2015)

    Google Scholar 

  24. Miao, H., Li, A., Davis, L.S., Deshpande, A.: Towards unified data and lifecycle management for deep learning. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 571–582. IEEE (2017)

    Google Scholar 

  25. Missier, P., Belhajjame, K., Cheney, J.: The W3C PROV family of specifications for modelling provenance metadata. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 773–776 (2013)

    Google Scholar 

  26. Moreau, L., Groth, P.: Provenance: an introduction to PROV. Synthesis Lect. Semant. Web Theory Technol. 3(4), 1–129 (2013)

    Article  Google Scholar 

  27. Murta, L., Braganholo, V., Chirigati, F., Koop, D., Freire, J.: noWorkflow: capturing and analyzing provenance of scripts. In: Ludäscher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 71–83. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16462-5_6

    Chapter  Google Scholar 

  28. Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 1447–1454. IEEE (2006)

    Google Scholar 

  29. Ocaña, K.A.C.S., Silva, V., de Oliveira, D., Mattoso, M.: Data analytics in bioinformatics: data science in practice for genomics analysis workflows. In: 11th IEEE International Conference on e-Science, e-Science 2015, Munich, Germany, 31 August–4 September 2015. pp. 322–331. IEEE Computer Society (2015). https://doi.org/10.1109/eScience.2015.50

  30. Ormenisan, A.A., Ismail, M., Haridi, S., Dowling, J.: Implicit provenance for machine learning artifacts. Proc. MLSys 20 (2020)

    Google Scholar 

  31. Orr, G.B., Müller, K.R.: Neural Networks: Tricks of the Trade. Springer (2003)

    Google Scholar 

  32. Pimentel, J.F., Freire, J., Murta, L., Braganholo, V.: A survey on collecting, managing, and analyzing provenance from scripts. ACM Comput. Surv. 52(3), 47:1–47:38 (2019). https://doi.org/10.1145/3311955

  33. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics informed deep learning (part I): data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561 (2017)

  34. Sáenz-Adán, C., Moreau, L., Pérez, B., Miles, S., García-Izquierdo, F.J.: Automating provenance capture in software engineering with UML2PROV. In: Belhajjame, K., Gehani, A., Alper, P. (eds.) IPAW 2018. LNCS, vol. 11017, pp. 58–70. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98379-0_5

    Chapter  Google Scholar 

  35. Schelter, S., Böse, J.H., Kirschnick, J., Klein, T., Seufert, S.: Automatically tracking metadata and provenance of machine learning experiments. In: Machine Learning Systems workshop at NIPS (2017)

    Google Scholar 

  36. Scherzinger, S., Seifert, C., Wiese, L.: The best of both worlds: challenges in linking provenance and explainability in distributed machine learning. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pp. 1620–1629. IEEE (2019)

    Google Scholar 

  37. Ferreira da Silva, R., et al.: Workflows community summit: Bringing the scientific workflows research community together, March 2021

    Google Scholar 

  38. Silva, V., et al.: Dfanalyzer: runtime dataflow analysis tool for computational science and engineering applications. SoftwareX 12, 100592 (2020)

    Google Scholar 

  39. Silva, V., de Oliveira, D., Valduriez, P., Mattoso, M.: DfAnalyzer: runtime dataflow analysis of scientific applications using provenance. Proc. VLDB Endow. 11(12), 2082–2085 (2018)

    Article  Google Scholar 

  40. Tsay, J., Mummert, T., Bobroff, N., Braz, A., Westerink, P., Hirzel, M.: Runway: machine learning model experiment management tool (2018)

    Google Scholar 

  41. Vartak, M., et al.: Model DB: a system for machine learning model management. In: Proceedings of the Workshop on Human-in-the-Loop Data Analytics, p. 14. ACM (2016)

    Google Scholar 

  42. Wang, D., et al.: From human-human collaboration to human-AI collaboration: designing AI systems that can work together with people. In: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–6 (2020)

    Google Scholar 

  43. Wang, D., et al.: Human-AI collaboration in data science: exploring data scientists’ perceptions of automated AI. Proc. ACM Hum. Comput. Interact. 3(CSCW), 1–24 (2019)

    Google Scholar 

  44. Warnke, T., Helms, T., Uhrmacher, A.M.: Reproducible and flexible simulation experiments with ml-rules and SESSL. Bioinformatics 34(8), 1424–1427 (2018). https://doi.org/10.1093/bioinformatics/btx741

    Article  Google Scholar 

  45. Xin, D., Ma, L., Liu, J., Macke, S., Song, S., Parameswaran, A.: Accelerating human-in-the-loop machine learning: challenges and opportunities. In: Proceedings of the Second Workshop on Data Management for End-to-End Machine Learning, pp. 1–4 (2018)

    Google Scholar 

  46. Zhang, Z., Sparks, E.R., Franklin, M.J.: Diagnosing machine learning pipelines with fine-grained lineage. In: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, pp. 143–153 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Débora Pina .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pina, D., Kunstmann, L., de Oliveira, D., Valduriez, P., Mattoso, M. (2021). Provenance Supporting Hyperparameter Analysis in Deep Neural Networks. In: Glavic, B., Braganholo, V., Koop, D. (eds) Provenance and Annotation of Data and Processes. IPAW IPAW 2020 2021. Lecture Notes in Computer Science(), vol 12839. Springer, Cham. https://doi.org/10.1007/978-3-030-80960-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-80960-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-80959-1

  • Online ISBN: 978-3-030-80960-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics