Provenance Supporting Hyperparameter Analysis in Deep Neural Networks

Pina, Débora; Kunstmann, Liliane; de Oliveira, Daniel; Valduriez, Patrick; Mattoso, Marta

doi:10.1007/978-3-030-80960-7_2

Débora Pina¹¹,
Liliane Kunstmann¹¹,
Daniel de Oliveira¹²,
Patrick Valduriez¹³ &
…
Marta Mattoso¹¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12839))

Included in the following conference series:

727 Accesses
3 Citations

Abstract

The duration of the life cycle in deep neural networks (DNN) depends on the data configuration decisions that lead to success in obtaining models. Analyzing hyperparameters along the evolution of the network’s execution allows for adapting the data. Provenance data derivation traces help the parameter fine-tuning by providing a global data picture with clear dependencies. Provenance can also contribute to the interpretation of models resulting from the DNN life cycle. However, there are challenges in collecting hyperparameters and in modeling the relationships between the data involved in the DNN life cycle to build a provenance database. Current approaches adopt different notions of provenance in their representation and require the execution of the DNN under a specific software framework, which limits interoperability and flexibility when choosing the DNN execution environment. This work presents a provenance data-based approach to address these challenges, proposing a collection mechanism with flexibility in the choice and representation of data to be analyzed. Experiments of the approach, using a convolutional neural network focused on image recognition, provide evidence of the flexibility, the efficiency of data collection, the analysis and the validation of network data.

This work is funded by CNPq, FAPERJ, and Inria (HPDaSc associated team). D. Pina and L. Kunstmann are supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Agrawal, P., et al.: Data platform for machine learning. In: Proceedings of the 2019 International Conference on Management of Data, pp. 1803–1816 (2019)
Google Scholar
Almeida, R.F., et al.: Managing data provenance for bioinformatics workflows using AProvBio. Int. J. Comput. Biol. Drug Des. 12(2), 153–170 (2019). https://doi.org/10.1504/IJCBDD.2019.099761
Article Google Scholar
Cheney, J., Chapman, A., Davidson, J., Forbes, A.: Data provenance, curation and quality in metrology. arXiv preprint arXiv:2102.08228 (2021)
Corrigan, D., Curcin, V., Ethier, J., Flynn, A.J., Sottara, D.: Challenges of deploying computable biomedical knowledge in real-world applications. In: AMIA 2019, American Medical Informatics Association Annual Symposium, Washington, DC, USA, 16–20 November 2019. AMIA (2019), http://knowledge.amia.org/69862-amia-1.4570936/t002-1.4575206/t002-1.4575207/3201770-1.4575319/3203261-1.4575316
Davison, A.: Automated capture of experiment context for easier reproducibility in computational research. Comput. Sci. Eng. 14(4), 48–56 (2012)
Article Google Scholar
Fairweather, E., Wittner, R., Chapman, M., Holub, P., Curcin, V.: Non-repudiable provenance for clinical decision support systems. CoRR abs/2006.11233 (2020). https://arxiv.org/abs/2006.11233
Fekete, J., Freire, J., Rhyne, T.: Exploring reproducibility in visualization. IEEE Comput. Graph. Appl. 40(5), 108–119 (2020). https://doi.org/10.1109/MCG.2020.3006412
Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10(3), 11–21 (2008)
Article Google Scholar
Gehani, A., Tariq, D.: SPADE: support for provenance auditing in distributed environments. In: Narasimhan, P., Triantafillou, P. (eds.) Middleware 2012. LNCS, vol. 7662, pp. 101–120. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35170-9_6
Chapter Google Scholar
Gharibi, G., Walunj, V., Alanazi, R., Rella, S., Lee, Y.: Automated management of deep learning experiments. In: Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning, p. 8. ACM (2019)
Google Scholar
Gharibi, G., Walunj, V., Rella, S., Lee, Y.: ModelKB: towards automated management of the modeling lifecycle in deep learning. In: Proceedings of the 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, pp. 28–34. IEEE Press (2019)
Google Scholar
Ghoshal, D., Plale, B.: Provenance from log files: a bigdata problem. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, pp. 290–297 (2013)
Google Scholar
Gil, Y., et al.: Artificial intelligence for modeling complex systems: taming the complexity of expert models to improve decision making. ACM Trans. Interact. Intell. Syst. (2021)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT press Cambridge (2016)
Google Scholar
Gurnani, A., Mavani, V., Gajjar, V., Khandhediya, Y.: Flower categorization using deep convolutional neural networks. arXiv preprint arXiv:1708.03763 (2017)
Herschel, M., Diestelkämper, R., Lahmar, H.B.: A survey on provenance: what for? what form? what from? VLDB J. 26(6), 881–906 (2017)
Article Google Scholar
Huynh, T.D., Stalla, S., Moreau, L.: Provenance-based explanations for automated decisions: final IAA project report (2019)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Lourenço, R., Freire, J., Shasha, D.: Debugging machine learning pipelines. In: Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning, pp. 1–10 (2019)
Google Scholar
Mattoso, M., et al.: Dynamic steering of HPC scientific workflows: a survey. Future Gener. Comput. Syst. 46, 100–113 (2015)
Article Google Scholar
McPhillips, T., Bowers, S., Belhajjame, K., Ludäscher, B.: Retrospective provenance without a runtime provenance recorder. In: 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2015) (2015)
Google Scholar
Miao, H., Li, A., Davis, L.S., Deshpande, A.: ModelHUB: lifecycle management for deep learning. Univ. of Maryland (2015)
Google Scholar
Miao, H., Li, A., Davis, L.S., Deshpande, A.: Towards unified data and lifecycle management for deep learning. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 571–582. IEEE (2017)
Google Scholar
Missier, P., Belhajjame, K., Cheney, J.: The W3C PROV family of specifications for modelling provenance metadata. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 773–776 (2013)
Google Scholar
Moreau, L., Groth, P.: Provenance: an introduction to PROV. Synthesis Lect. Semant. Web Theory Technol. 3(4), 1–129 (2013)
Article Google Scholar
Murta, L., Braganholo, V., Chirigati, F., Koop, D., Freire, J.: noWorkflow: capturing and analyzing provenance of scripts. In: Ludäscher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 71–83. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16462-5_6
Chapter Google Scholar
Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 1447–1454. IEEE (2006)
Google Scholar
Ocaña, K.A.C.S., Silva, V., de Oliveira, D., Mattoso, M.: Data analytics in bioinformatics: data science in practice for genomics analysis workflows. In: 11th IEEE International Conference on e-Science, e-Science 2015, Munich, Germany, 31 August–4 September 2015. pp. 322–331. IEEE Computer Society (2015). https://doi.org/10.1109/eScience.2015.50
Ormenisan, A.A., Ismail, M., Haridi, S., Dowling, J.: Implicit provenance for machine learning artifacts. Proc. MLSys 20 (2020)
Google Scholar
Orr, G.B., Müller, K.R.: Neural Networks: Tricks of the Trade. Springer (2003)
Google Scholar
Pimentel, J.F., Freire, J., Murta, L., Braganholo, V.: A survey on collecting, managing, and analyzing provenance from scripts. ACM Comput. Surv. 52(3), 47:1–47:38 (2019). https://doi.org/10.1145/3311955
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics informed deep learning (part I): data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561 (2017)
Sáenz-Adán, C., Moreau, L., Pérez, B., Miles, S., García-Izquierdo, F.J.: Automating provenance capture in software engineering with UML2PROV. In: Belhajjame, K., Gehani, A., Alper, P. (eds.) IPAW 2018. LNCS, vol. 11017, pp. 58–70. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98379-0_5
Chapter Google Scholar
Schelter, S., Böse, J.H., Kirschnick, J., Klein, T., Seufert, S.: Automatically tracking metadata and provenance of machine learning experiments. In: Machine Learning Systems workshop at NIPS (2017)
Google Scholar
Scherzinger, S., Seifert, C., Wiese, L.: The best of both worlds: challenges in linking provenance and explainability in distributed machine learning. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pp. 1620–1629. IEEE (2019)
Google Scholar
Ferreira da Silva, R., et al.: Workflows community summit: Bringing the scientific workflows research community together, March 2021
Google Scholar
Silva, V., et al.: Dfanalyzer: runtime dataflow analysis tool for computational science and engineering applications. SoftwareX 12, 100592 (2020)
Google Scholar
Silva, V., de Oliveira, D., Valduriez, P., Mattoso, M.: DfAnalyzer: runtime dataflow analysis of scientific applications using provenance. Proc. VLDB Endow. 11(12), 2082–2085 (2018)
Article Google Scholar
Tsay, J., Mummert, T., Bobroff, N., Braz, A., Westerink, P., Hirzel, M.: Runway: machine learning model experiment management tool (2018)
Google Scholar
Vartak, M., et al.: Model DB: a system for machine learning model management. In: Proceedings of the Workshop on Human-in-the-Loop Data Analytics, p. 14. ACM (2016)
Google Scholar
Wang, D., et al.: From human-human collaboration to human-AI collaboration: designing AI systems that can work together with people. In: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–6 (2020)
Google Scholar
Wang, D., et al.: Human-AI collaboration in data science: exploring data scientists’ perceptions of automated AI. Proc. ACM Hum. Comput. Interact. 3(CSCW), 1–24 (2019)
Google Scholar
Warnke, T., Helms, T., Uhrmacher, A.M.: Reproducible and flexible simulation experiments with ml-rules and SESSL. Bioinformatics 34(8), 1424–1427 (2018). https://doi.org/10.1093/bioinformatics/btx741
Article Google Scholar
Xin, D., Ma, L., Liu, J., Macke, S., Song, S., Parameswaran, A.: Accelerating human-in-the-loop machine learning: challenges and opportunities. In: Proceedings of the Second Workshop on Data Management for End-to-End Machine Learning, pp. 1–4 (2018)
Google Scholar
Zhang, Z., Sparks, E.R., Franklin, M.J.: Diagnosing machine learning pipelines with fine-grained lineage. In: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, pp. 143–153 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Débora Pina, Liliane Kunstmann & Marta Mattoso
Fluminense Federal University, Niterói, Rio de Janeiro, Brazil
Daniel de Oliveira
Inria, University of Montpellier, CNRS, LIRMM, Montpellier, France
Patrick Valduriez

Authors

Débora Pina
View author publications
You can also search for this author in PubMed Google Scholar
Liliane Kunstmann
View author publications
You can also search for this author in PubMed Google Scholar
Daniel de Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Valduriez
View author publications
You can also search for this author in PubMed Google Scholar
Marta Mattoso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Débora Pina .

Editor information

Editors and Affiliations

Illinois Institute of Technology, Chicago, IL, USA
Boris Glavic
Fluminense Federal University, Niterói, Brazil
Vanessa Braganholo
Northern Illinois University, DeKalb, IL, USA
David Koop

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pina, D., Kunstmann, L., de Oliveira, D., Valduriez, P., Mattoso, M. (2021). Provenance Supporting Hyperparameter Analysis in Deep Neural Networks. In: Glavic, B., Braganholo, V., Koop, D. (eds) Provenance and Annotation of Data and Processes. IPAW IPAW 2020 2021. Lecture Notes in Computer Science(), vol 12839. Springer, Cham. https://doi.org/10.1007/978-3-030-80960-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-80960-7_2
Published: 09 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80959-1
Online ISBN: 978-3-030-80960-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics