Skip to main content

PEGASEF: A Provenance-Based Big Data Service Framework for Efficient Simulation Execution on Shared Computing Clusters

  • Conference paper
  • First Online:
Big Data Applications and Services 2017 (BIGDAS 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 770))

Included in the following conference series:

  • 344 Accesses

Abstract

Over the past years high-performance computing (HPC) simulation programs have been aggressively employed to solve complex problems in a variety of computational science and engineering disciplines. As those programs are shared in an online platform, many users can easily run their simulations on the platform as long as they are connected on the web. However, repetitive simulations from users have charged a significant burden on the platform’s limited computing and storage resources. To address the concern of inefficiency in simulation execution, we propose a big data service framework based on past simulation records. Such records are called provenances , which capture various properties in simulation. By utilizing the provenances, the platform can perform more efficient simulations via duplicate elimination and assist users with enhanced simulation service such as result prediction, execution-time estimation, and input-parameter clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Suh, Y.-K, Ryu, Hoon, Kim, Hanki, and Cho, Kum Won: EDISON: A Web-Based HPC Simulation Execution Framework for Large-Scale Scientific Computing Software. In: IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp. 608–612. IEEE Press, New York (2016)

    Google Scholar 

  2. The EDISON platform, http://www.edison.re.kr

  3. Simmhan, Yogesh L., Plale, Beth, and Gannon, Dennis: A Survey of Data Provenance in e-Science. In: SIGMOD Record, Vol. 34, No. 3, pp. 31–36. ACM (2005)

    Google Scholar 

  4. Herschel, Melanie, Diestelkämper, Ralf, and Lahmar, Houssem Ben: A survey on provenance: What for? What form? What from? In: The VLDB Journal, Vol. 26, Issue 6, pp. 881–906. Springer, Heidelberg (2017)

    Google Scholar 

  5. The HUBzero platform, http://hubzero.org

  6. nanoHUB, http://nanohub.org

  7. DataHUB, https://datacenterhub.org

  8. SimulationHub, https://simulationhub.com

  9. WebMO, https://www.webmo.net

  10. Stevens, Robert D., Robinson, Alan J., Goble, and Carole A.: myGrid: Personalised Bioinformatics on the Information Grid. In: Bioinformatics, Volume 19, Issue suppl_1, 3, pp. 302–304, July 2003

    Google Scholar 

  11. Taverna, http://taverna.sourceforge.net

  12. Kepler, https://kepler-project.org/

  13. Ikeda, R., Park, H., and Widom, J.: Provenance for Generalized Map and Reduce Workflows. In: 5th biennial Conference on Innovative Data Systems Research, pp. 273–283. (2011)

    Google Scholar 

  14. Akoush, S., Sohan, R., and Hopper, A.: HadoopProv: Towards Provenance as a First Class Citizen in MapReduce. In: USENIX Conference on Theory and Practice of Provenance, pp. 11:1–11:4. USENIX Association (2013)

    Google Scholar 

  15. Amsterdamer, Y., Davidson, S. B., Deutch, D., Milo, T., Stoyanovich, J., and Tannen, V.: Putting Lipstick on Pig: Enabling Database-style Workflow Provenance. In: The VLDB Endowment, Vol. 5, No. 4, pp. 346–357. VLDB Endowment (2011)

    Google Scholar 

  16. Apache Pig, https://pig.apache.org/

  17. Hammad, R. and Wu, C.: Provenance as a Service: A Data-centric Approach for Real-time Monitoring. In: IEEE International Congress on Big Data, pp. 258—265. IEEE (2014)

    Google Scholar 

  18. e-Science Central, https://www.esciencecentral.org/

  19. MongoDB, http://www.mongodb.com

Download references

Acknowledgement

This research was supported by Kyungpook National University Research Fund, 2017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ki Yong Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Suh, YK., Lee, K.Y., Baek, N. (2019). PEGASEF: A Provenance-Based Big Data Service Framework for Efficient Simulation Execution on Shared Computing Clusters. In: Lee, W., Leung, C. (eds) Big Data Applications and Services 2017. BIGDAS 2017. Advances in Intelligent Systems and Computing, vol 770. Springer, Singapore. https://doi.org/10.1007/978-981-13-0695-2_17

Download citation

Publish with us

Policies and ethics