Skip to main content

Anomaly Detection in MapReduce Using Transformation Provenance

  • Conference paper
  • First Online:
Advances in Big Data and Cloud Computing

Abstract

Data provenance is the metadata that captures information about data origin, how it was manipulated, and updated over time. Data provenance has great significance for big data applications as it provides mechanisms for verification of results. This paper discusses an approach to detect anomalies in Hadoop cluster/MapReduce job by reviewing the transformation provenance captured by mining the MapReduce logs. A rule-based framework is used to identify the patterns for extracting provenance information. The provenance information derived is converted into a provenance profile which is used for detecting anomalies in cluster and job execution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Glavic, B., Dittrich, K.: Data provenance: a categorization of existing approaches. In: Proceedings of the 12th GI Conference on Datenbanksysteme in Business, Technologie und Web (2007)

    Google Scholar 

  2. Ikeda, R., Widom, J.: Panda: a system for provenance and data. IEEE Data Eng. Bull. Spec. Issue Data Provenance 33(3), 42–49 (2010)

    Google Scholar 

  3. Rama, S., Liu, J.: Understanding the semantics of data provenance to support active conceptual modeling. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4512, pp. 17–29. LNCS (2008)

    Google Scholar 

  4. Simmhan, Y.L., Pale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005). https://doi.org/10.1145/1084805.1084812

  5. Ikeda, R., Widom, J.: Ramp: a system for capturing and tracing provenance in map reduce workflows. In: International Conference on Very Large Databases (August 2011)

    Google Scholar 

  6. Akoush, S., Sohan, R., Hopper, A.: Hadoopprov: towards provenance as a first class citizen in mapreduce. In: Presented as part of the 5th USENIX Workshop on the Theory and Practice of Provenance. USENIX, Berkeley, CA (2013)

    Google Scholar 

  7. Crawl, D., Wang, J., Altintas, I.: Provenance for mapreduce-based data-intensive workflows. In: Proceedings of the 6th Workshop on Workflows in Support of Large-Scale Science WORKS’11, pp. 21–30 (2011)

    Google Scholar 

  8. Wei, W., Du, J., Yu, T., Gu, X.: Securemr: a service integrity assurance framework for mapreduce. In: 2009 Annual Computer Security Applications Conference, ACSAC’09, pp. 73–82 (2009)

    Google Scholar 

  9. Ghoshal, D., Plale, B.: Provenance from log files: a bigdata problem. In: ACM International Conference Proceeding Series, pp. 290–297 (2013)

    Google Scholar 

  10. Chacko, A., Madhu, S., Madhu Kumar S.D., Gupta, A.: Improving execution speed of incremental runs of mapreduce using provenance. In: Special Issue on Big Data Visualization and Analytics. Inderscience Publishers (In Press) (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anu Mary Chacko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chacko, A.M., Medicherla, J.S., Madhu Kumar, S.D. (2018). Anomaly Detection in MapReduce Using Transformation Provenance. In: Rajsingh, E., Veerasamy, J., Alavi, A., Peter, J. (eds) Advances in Big Data and Cloud Computing. Advances in Intelligent Systems and Computing, vol 645. Springer, Singapore. https://doi.org/10.1007/978-981-10-7200-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7200-0_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7199-7

  • Online ISBN: 978-981-10-7200-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics