A Data Quality Framework for Customer Relationship Analytics

Chiang, Fei; Sitaramachandran, Siddharth

doi:10.1007/978-3-319-26187-4_35

Fei Chiang²⁰ &
Siddharth Sitaramachandran²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9419))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1449 Accesses
2 Citations

Abstract

Poor data quality has become an increasingly pervasive problem for organizations leading to operational inefficiency, increased costs, and missed opportunities. As high quality data is a prerequisite to trusted data analysis, we propose a framework that focuses on improving the data model to improve data quality. In particular, we show how changes to the underlying data design can achieve key data quality properties. We conduct a case study that demonstrates the application of the framework to a customer relationship management (CRM) problem. Our evaluation shows that a set of CRM queries can be efficiently run over data sizes of up to 10 million records, and organizations can glean new insights about customer preferences and activity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We refer to data quality rules and (integrity) constraints interchangeably.
2.
The name AirWave is used to protect the organization’s identity.
3.
The CRM queries can be found at: www.cas.mcmaster.ca/~sitaras/casestudy/.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994)
Google Scholar
Batini, C., Scannapieco, M.: Data Quality: Concepts, Methods and Techniques. Springer, Heidelberg (2006)
Google Scholar
Berti-Equille, L., Dasu, T., Srivastava, D.: Discovery of complex glitch patterns: a novel approach to quantitative data cleaning. In: ICDE, pp. 733–744 (2011)
Google Scholar
Chiang, F., Miller, R.J.: Active repair of data quality rules. In: IJIQ, pp. 174–188 (2011)
Google Scholar
Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I.F., Ouzzani, M., Tang, N.: NADEEF: a commodity data cleaning system. In: SIGMOD, pp. 541–552 (2013)
Google Scholar
Dasu, T., Loh, J.M.: Statistical distortion: consequences of data cleaning. PVLDB 5(11), 1674–1683 (2012)
Google Scholar
Geerts, F., Mecca, G., Papotti, P., Santoro, D.: The LLUNATIC data-cleaning framework. PVLDB 6(9), 625–636 (2013)
Google Scholar
Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: Efficient discovery of functional and approximate dependencies using partitions. In: ICDE, pp. 392–401 (1998)
Google Scholar
Judah, S., Friedman, T.: Twelve ways to improve your data quality. Gartner Research Report (2014)
Google Scholar
Khayyat, Z., Ilyas, I., Jindal, A., Madden, S., Ouzzani, M., Papotti, P., Quiané-Ruiz, J., Tang, N., Yin, S.: Bigdansing: a system for big data cleansing. In: SIGMOD, pp. 1215–1230 (2015)
Google Scholar
Lopes, S., Petit, J.-M., Lakhal, L.: Efficient discovery of functional dependencies and armstrong relations. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 350–364. Springer, Heidelberg (2000)
Chapter Google Scholar
Moore, M.: Dirty data is a business problem, not an it problem. Gartner (2007)
Google Scholar
Pei, J., Han, J.: Constrained frequent pattern mining: a pattern-growth view. SIGKDD Explor. 4(1), 31–39 (2002)
Article Google Scholar
Wang, X., Dong, X., Meliou, A.: Data x-ray: a diagnostic tool for data errors. In: SIGMOD, pp. 1231–1245 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

McMaster University, Hamilton, Canada
Fei Chiang & Siddharth Sitaramachandran

Authors

Fei Chiang
View author publications
You can also search for this author in PubMed Google Scholar
Siddharth Sitaramachandran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fei Chiang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jianyong Wang
Poznan University of Economics, Poznan, Poland
Wojciech Cellary
Florida Atlantic University, Boca Raton, Florida, USA
Dingding Wang
Victoria University, Melbourne, Victoria, Australia
Hua Wang
Florida International University, Miami, Florida, Florida, USA
Shu-Ching Chen
Florida International University, Miami, Florida, USA
Tao Li
Victoria University, Melbourne, Victoria, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chiang, F., Sitaramachandran, S. (2015). A Data Quality Framework for Customer Relationship Analytics. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9419. Springer, Cham. https://doi.org/10.1007/978-3-319-26187-4_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-26187-4_35
Published: 18 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26186-7
Online ISBN: 978-3-319-26187-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics