Map Reduce Autoscaling over the Cloud with Process Mining Monitoring
Over the last years, the traditional pressing need for fast and reliable processing solutions has been further exacerbated by the increase of data volumes – produced by mobile devices, sensors and almost ubiquitous internet availability. These big data must be analyzed to extract further knowledge.
Distributed programming models, such as Map Reduce, are providing a technical answer to this challenge. Furthermore, when relaying on cloud infrastructures, Map Reduce platforms can easily be runtime provided with additional computing nodes (e.g., the system administrator can scale the infrastructure to face temporal deadlines). Nevertheless, the execution of distributed programming models on the cloud still lacks automated mechanisms to guarantee the Quality of Service (i.e., autonomous scale-up/-down behavior).
In this paper, we focus on the steps of monitoring Map Reduce applications (to detect situations where the temporal deadline will be exceeded) and performing recovery actions on the cluster (by automatically providing additional resources to boost the computation). To this end, we exploit some techniques and tools developed in the research field of Business Process Management: in particular, we focus on declarative languages and tools for monitoring the execution of business process. We introduce a distributed architecture where a logic-based monitor is able to detect possible delays, and trigger recovery actions such as the dynamic provisioning of a congruent number of resources.
KeywordsBusiness Process Management Map Reduce Cloud computing Autonomic system
- 1.Amazon Cloud Watch (2016). https://aws.amazon.com/it/cloudwatch/. Accessed July 2016
- 2.Apache Hadoop (2016). https://hadoop.apache.org/. Accessed July 2016
- 3.Apache Spark (2016). http://spark.apache.org. Accessed July 2016
- 4.Armbrust, M., Fox, O., R., G.: Above the clouds: a Berkeley view of cloud computing. Technical rep., Electrical Engineering and Computer Sciences, University of California at Berkeley (2009)Google Scholar
- 10.Ekanayake, J., Li, H., Zhang, B.: Twister: a runtime for iterative Map Reduce. In: Proceedings of the First International Workshop on Map Reduce and its Application of ACM HPDC Conference (2010)Google Scholar
- 11.Farrel, A., Sergot, M., Sallè, M., Bartolini, C.: Using the event calculus for tracking the normative state of contracts. Int. J. Coop. Inf. Syst. 14(02n03), 99–129 (2005). http://www.worldscientific.com/doi/abs/10.1142/S0218843005001110
- 12.Giannakopoulou, D., Havelund, K.: Automata-based verification of temporal properties on running programs. In: Proceedings of 16th Annual International Conference on Automated Software Engineering (ASE 2001), pp. 412–416, November 2001Google Scholar
- 15.Loreti, D., Ciampolini, A.: A hybrid cloud infrastructure of Big Data applications. In: Proceedings of IEEE International Conferences on High Performance Computing and Communications (2015)Google Scholar
- 16.Mattess, M., Calheiros, R., Buyya, R.: Scaling MapReduce applications across hybrid clouds to meet soft deadlines. In: 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA), pp. 629–636, March 2013Google Scholar
- 17.Montali, M., Chesani, F., Mello, P., Maggi, F.M.: Towards data-aware constraints in declare. In: Shin, S.Y., Maldonado, J.C. (eds.) Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC 2013, Coimbra, Portugal, 18–22 March 2013, pp. 1391–1396. ACM (2013). http://doi.acm.org/10.1145/2480362.2480624
- 19.OpenStack Ceilometer (2016). https://wiki.openstack.org/wiki/Ceilometer. Accessed July 2016
- 23.Spanoudakis, G., Mahbub, K.: Non-intrusive monitoring of service-based systems. Int. J. Coop. Inf. Syst. 15(03), 325–358 (2006). http://www.worldscientific.com/doi/abs/10.1142/S0218843006001384