Abstract
Data provenance is the metadata that captures information about data origin, how it was manipulated, and updated over time. Data provenance has great significance for big data applications as it provides mechanisms for verification of results. This paper discusses an approach to detect anomalies in Hadoop cluster/MapReduce job by reviewing the transformation provenance captured by mining the MapReduce logs. A rule-based framework is used to identify the patterns for extracting provenance information. The provenance information derived is converted into a provenance profile which is used for detecting anomalies in cluster and job execution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Glavic, B., Dittrich, K.: Data provenance: a categorization of existing approaches. In: Proceedings of the 12th GI Conference on Datenbanksysteme in Business, Technologie und Web (2007)
Ikeda, R., Widom, J.: Panda: a system for provenance and data. IEEE Data Eng. Bull. Spec. Issue Data Provenance 33(3), 42–49 (2010)
Rama, S., Liu, J.: Understanding the semantics of data provenance to support active conceptual modeling. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4512, pp. 17–29. LNCS (2008)
Simmhan, Y.L., Pale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005). https://doi.org/10.1145/1084805.1084812
Ikeda, R., Widom, J.: Ramp: a system for capturing and tracing provenance in map reduce workflows. In: International Conference on Very Large Databases (August 2011)
Akoush, S., Sohan, R., Hopper, A.: Hadoopprov: towards provenance as a first class citizen in mapreduce. In: Presented as part of the 5th USENIX Workshop on the Theory and Practice of Provenance. USENIX, Berkeley, CA (2013)
Crawl, D., Wang, J., Altintas, I.: Provenance for mapreduce-based data-intensive workflows. In: Proceedings of the 6th Workshop on Workflows in Support of Large-Scale Science WORKS’11, pp. 21–30 (2011)
Wei, W., Du, J., Yu, T., Gu, X.: Securemr: a service integrity assurance framework for mapreduce. In: 2009 Annual Computer Security Applications Conference, ACSAC’09, pp. 73–82 (2009)
Ghoshal, D., Plale, B.: Provenance from log files: a bigdata problem. In: ACM International Conference Proceeding Series, pp. 290–297 (2013)
Chacko, A., Madhu, S., Madhu Kumar S.D., Gupta, A.: Improving execution speed of incremental runs of mapreduce using provenance. In: Special Issue on Big Data Visualization and Analytics. Inderscience Publishers (In Press) (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chacko, A.M., Medicherla, J.S., Madhu Kumar, S.D. (2018). Anomaly Detection in MapReduce Using Transformation Provenance. In: Rajsingh, E., Veerasamy, J., Alavi, A., Peter, J. (eds) Advances in Big Data and Cloud Computing. Advances in Intelligent Systems and Computing, vol 645. Springer, Singapore. https://doi.org/10.1007/978-981-10-7200-0_8
Download citation
DOI: https://doi.org/10.1007/978-981-10-7200-0_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7199-7
Online ISBN: 978-981-10-7200-0
eBook Packages: EngineeringEngineering (R0)