Data Glitches: Monsters in Your Data

Dasu, Tamraparni

doi:10.1007/978-3-642-36257-6_8

Data Glitches: Monsters in Your Data

Tamraparni Dasu²

Chapter
First Online: 01 January 2013

5341 Accesses
5 Citations

Abstract

Data types and data structures are becoming increasingly complex as they keep pace with evolving technologies and applications. The result is an increase in the number and complexity of data quality problems. Data glitches, a common name for data quality problems, can be simple and stand alone, or highly complex with spatial and temporal correlations. In this chapter, we provide an overview of a comprehensive and measurable data quality process. To begin, we define and classify complex glitch types, and describe detection and cleaning techniques. We present metrics for assessing data quality and for choosing cleaning strategies subject to a variety of considerations. The process culminates in a “clean” data set that is acceptable to the end user. We conclude with an overview of significant literature in this area, and a discussion of opportunities for practice, application, and further research.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, Chichester
MATH Google Scholar
Berti-Equille L, Dasu T (2009) Advances in data quality mining. Tutorial, KDD
Google Scholar
Berti-Equille L, Dasu T, Srivastava D (2011) Discovery of complex glitch patterns: a novel approach to quantitative data cleaning. In: 2011 IEEE 27th international conference on data engineering (ICDE)
Google Scholar
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3), Article 15, 58 p
Google Scholar
Dasu T, Johnson T (2003) Exploratory data mining and data cleaning. Wiley, New York
Book MATH Google Scholar
Dasu T, Johnson T, Muthukrishnan S, Shkapenyuk V (2002) Mining database structure; or, how to build a data quality browser. In: Proceedings of the SIGMOD
Google Scholar
Dasu T, Loh JM (2012) Statistical distortion: consequences of data cleaning. PVLDB 5(11):1674–1683
Google Scholar
Elmagarmid AK, Ipeirotis PG, Verykios VS (2007) Duplicate record detection a survey. IEEE Trans Knowledge Data Eng 19(1):1–16
Article Google Scholar
Golab L, Saha A, Karloff H, Srivastava D, Korn P (2009) Sequential dependencies. PVLDB 2(1):574–585
Google Scholar
Kriegel H, Kroger P, Zimek A (2009) Outlier detection techniques. Tutorial, PAKDD
Google Scholar
Liu X, Dong XL, Ooi BC, Srivastava D (2011) Online data fusion. PVLDB 4(11):932–943
Google Scholar
Rao CR (1973) Linear statistical inference and its applications. Wiley, New York
Book MATH Google Scholar
Redman T (1997) Data quality for the information age. Artech House, Norwood
Google Scholar

Download references

Author information

Authors and Affiliations

AT&T Labs Research, 180 Park Avenue, Florham Park, NJ, 07932, USA
Tamraparni Dasu

Authors

Tamraparni Dasu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tamraparni Dasu .

Editor information

Editors and Affiliations

University of Queensland, Brisbane, Australia
Shazia Sadiq

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dasu, T. (2013). Data Glitches: Monsters in Your Data. In: Sadiq, S. (eds) Handbook of Data Quality. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36257-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-36257-6_8
Published: 13 February 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36256-9
Online ISBN: 978-3-642-36257-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics