Abstract
This chapter provides an overview of the tasks related to the continuous process of monitoring the quality of cloud databases as their content is modified over time. In the Software as a Service context, this process must be guided by data quality service level agreements, which aim to specify customers’ requirements regarding the process of data quality monitoring. In practice, factors such as the Big Data scale, lack of data structure, strict service level agreement requirements, and the velocity of the changes over the data imply many challenges for an effective accomplishment of this process. In this context, we present a high-level architecture of a cloud service, which employs cloud computing capabilities in order to tackle these challenges, as well as the technical and research problems that may be further explored to allow an effective deployment of the presented service.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Loshin D (2010) The practitioner’s guide to data quality improvement. Elsevier, Burlington
Sadiq S (ed) (2013) Handbook of data quality. Springer, New York
Buhl HU, Röglinger M, Moser DK, Heidemann J (2013) Big data: a fashionable topic with(out) sustainable relevance for research and practice? Bus Inf Syst Eng 5(2):65–69
Kaisler S, Armour F, Espinosa JA, Money W (2013) Big data: issues and challenges moving forward. In: Proceedings of the 46th Hawaii international conference on system sciences (HICSS), pp 995–1004
Katal A, Wazid M, Goudar RH (2013) Big data: issues, challenges, tools and good practices. In: Proceedings of the 6th international conference on contemporary computing, pp 404–409
Badidi E (2013) A cloud service broker for SLA-based SaaS provisioning. In: Proceedings of the international conference on information society, pp 61–66
Schnjakin M, Alnemr R, Meinel C (2010) Contract-based cloud architecture. In: Proceedings of the second international workshop on cloud data management, pp 33–40
Christen P (2012) A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans Knowl Data Eng 24(9):1537–1555
Bizer C, Boncz P, Brodie ML, Erling O (2012) The meaningful use of big data: four perspectives – four challenges. ACM SIGMOD Record 40(4):56–60
Gruenheid A, Dong XL, Srivastava D (2014) Incremental record linkage. Proc VLDB Endowment 7(9):697–708
Ioannou E, Rassadko N, Velegrakis Y (2013) On generating benchmark data for entity matching. J Data Semantics 2(1):37–56
Hsueh SC, Lin MY, Chiu YC (2014) A load-balanced mapreduce algorithm for blocking-based entity-resolution with multiple keys. In: Proceedings of the 12th Australasian symposium on parallel and distributed computing, pp 3–9
Mestre DG, Pires CE, Nascimento DC (2015) Adaptive sorted neighborhood blocking for entity matching with mapReduce. In: Proceedings of the 30th ACM/SIGAPP symposium on applied computing, pp 981–987
Baxter R, Christen P, Churches T (2003) A comparison of fast blocking methods for record linkage. ACM SIGKDD 3:25–27
Dillon T, Wu C, Chang E (2010) Cloud computing: issues and challenges. In: Proceedings of the 24th IEEE international conference on advanced information networking and applications, pp 27–33
Nascimento DC, Pires CE, Mestre D (2015) A data quality-aware cloud service based on metaheuristic and machine learning provisioning algorithms. In: Proceedings of the 30th ACM/SIGAPP symposium on applied computing, pp 1696–1703
Dan A, Davis D, Kearney R, Keller A, King R, Kuebler D, Youssef A (2004) Web services on demand: WSLA-driven automated management. IBM Syst J 43(1):136–158
Ferretti S, Ghini V, Panzieri F, Pellegrini M, Turrini E (2010) Qos–aware clouds. In: Proceedings of the IEEE 3rd international conference on cloud computing, pp 321–328
Skene J, Lamanna DD, Emmerich W (2004) Precise service level agreements. In: Proceedings of the 26th international conference on software engineering, pp 179–188
Batini C, Cappiello C, Francalanci C, Maurino A (2009) Methodologies for data quality assessment and improvement. ACM Comput Surv 41(3):1–52. doi:10.1145/1541880.1541883, ISSN: 0360–0300
Sidi F, Shariat PH, Affendey LS, Jabar MA, Ibrahim H, Mustapha A (2012) Data quality: a survey of data quality dimensions. In: Proceedings of the international conference on information retrieval and knowledge management, pp 300–304
Wang RY, Strong DM (1996) Beyond accuracy: what data quality means to data consumers. J Manag Inf Syst 12(4):5–33
Rana OF, Warnier M, Quillinan TB, Brazier F, Cojocarasu D (2008) Managing violations in service level agreements. In: Grid middleware and services. Springer, pp 349–358. http://link.springer.com/chapter/10.1007/978-0-387-78446-5_23
Reynolds MB, Hopkinson KM, Oxley ME, Mullins BE (2011) Provisioning norm: an asymmetric quality measure for SaaS resource allocation. In: Proceedings of the IEEE international conference on services computing, pp 112–119
Kolb L, Thor A, Rahm E (2013) Load balancing for mapreduce-based entity resolution. In: Proceedings of the IEEE 28th international conference on data engineering, pp 618–629
Mestre DG, Pires CE (2013) Improving load balancing for mapreduce-based entity matching. In: IEEE symposium on computers and communications, pp 618–624
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Christen P, Goiser K (2007) Quality and complexity measures for data linkage and deduplication. In: Quality measures in data mining. Springer, Berlin/Heidelberg
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Nascimento, D.C., Pires, C.E., Mestre, D. (2015). Data Quality Monitoring of Cloud Databases Based on Data Quality SLAs. In: Trovati, M., Hill, R., Anjum, A., Zhu, S., Liu, L. (eds) Big-Data Analytics and Cloud Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-25313-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-25313-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25311-4
Online ISBN: 978-3-319-25313-8
eBook Packages: Computer ScienceComputer Science (R0)