Abstract
Poor data quality has become an increasingly pervasive problem for organizations leading to operational inefficiency, increased costs, and missed opportunities. As high quality data is a prerequisite to trusted data analysis, we propose a framework that focuses on improving the data model to improve data quality. In particular, we show how changes to the underlying data design can achieve key data quality properties. We conduct a case study that demonstrates the application of the framework to a customer relationship management (CRM) problem. Our evaluation shows that a set of CRM queries can be efficiently run over data sizes of up to 10 million records, and organizations can glean new insights about customer preferences and activity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We refer to data quality rules and (integrity) constraints interchangeably.
- 2.
The name AirWave is used to protect the organization’s identity.
- 3.
The CRM queries can be found at: www.cas.mcmaster.ca/~sitaras/casestudy/.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994)
Batini, C., Scannapieco, M.: Data Quality: Concepts, Methods and Techniques. Springer, Heidelberg (2006)
Berti-Equille, L., Dasu, T., Srivastava, D.: Discovery of complex glitch patterns: a novel approach to quantitative data cleaning. In: ICDE, pp. 733–744 (2011)
Chiang, F., Miller, R.J.: Active repair of data quality rules. In: IJIQ, pp. 174–188 (2011)
Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I.F., Ouzzani, M., Tang, N.: NADEEF: a commodity data cleaning system. In: SIGMOD, pp. 541–552 (2013)
Dasu, T., Loh, J.M.: Statistical distortion: consequences of data cleaning. PVLDB 5(11), 1674–1683 (2012)
Geerts, F., Mecca, G., Papotti, P., Santoro, D.: The LLUNATIC data-cleaning framework. PVLDB 6(9), 625–636 (2013)
Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: Efficient discovery of functional and approximate dependencies using partitions. In: ICDE, pp. 392–401 (1998)
Judah, S., Friedman, T.: Twelve ways to improve your data quality. Gartner Research Report (2014)
Khayyat, Z., Ilyas, I., Jindal, A., Madden, S., Ouzzani, M., Papotti, P., Quiané-Ruiz, J., Tang, N., Yin, S.: Bigdansing: a system for big data cleansing. In: SIGMOD, pp. 1215–1230 (2015)
Lopes, S., Petit, J.-M., Lakhal, L.: Efficient discovery of functional dependencies and armstrong relations. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 350–364. Springer, Heidelberg (2000)
Moore, M.: Dirty data is a business problem, not an it problem. Gartner (2007)
Pei, J., Han, J.: Constrained frequent pattern mining: a pattern-growth view. SIGKDD Explor. 4(1), 31–39 (2002)
Wang, X., Dong, X., Meliou, A.: Data x-ray: a diagnostic tool for data errors. In: SIGMOD, pp. 1231–1245 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chiang, F., Sitaramachandran, S. (2015). A Data Quality Framework for Customer Relationship Analytics. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9419. Springer, Cham. https://doi.org/10.1007/978-3-319-26187-4_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-26187-4_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26186-7
Online ISBN: 978-3-319-26187-4
eBook Packages: Computer ScienceComputer Science (R0)