Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Normal forms beyond first normal form (second normal form, etc.) are often less desirable for analysis purposes: one might wish to denormalize data (e.g., by joining relations with primary-foreign key relationships) in order to more conveniently perform analysis over a single table.
References
Carr DB, Littlefield RJ, Nicholson W, Littlefield J (1987) Scatterplot matrix techniques for large N. J Am Stat Assoc 82(398):424–436
Chiticariu L, Kolaitis PG, Popa L (2008) Interactive generation of integrated schemas. In: ACM SIGMOD, pp 833–846
Codd EF (1971b) Further normalization of the data base relational model. In: Courant computer science symposia 6, Data base systems, (New York, May 24–25) pp 33–64, Prentice-Hall
Dasu T, Johnson T (2003) Exploratory data mining and data cleaning. Wiley, New York
Dasu T, Johnson T, Muthukrishnan S, Shkapenyuk V (2002) Mining database structure; or, how to build a data quality browser. In: ACM SIGMOD, pp 240–251
Doan A, Halevy A, Ives Z (2012) Principles of data integration. Elsevier, Amsterdam
Eaton C, Plaisant C, Drizd T (2003) The challenge of missing and uncertain data. In: Proceedings of the IEEE visualization, p 100
Elmagarmid AK, Ipeirotis PG, Verykios VS (2007) Duplicate record detection: a survey. IEEE TKDE 19(1):1–16
Fisher K, Walker D (2011) The PADS project: an overview. In: International conference on database theory, Mar 2011
Galhardas H, Florescu D, Shasha D, Simon E (2000) AJAX: an extensible data cleaning tool. In: ACM SIGMOD, p 590
Gulwani S (2011) Automating string processing in spreadsheets using input-output examples. In: ACM POPL, pp 317–330
Guo PJ, Kandel S, Hellerstein J, Heer J (2011) Proactive wrangling: mixed-initiative end-user programming of data transformation scripts. In: ACM user interface software & technology (UIST)
Harris W, Gulwani S (2011) Spreadsheet table transformations from examples. In: ACM PLDI
Heer J, Hellerstein JM, Kandel S (2015) Predictive interaction for data transformation. In: CIDR
Hellerstein JM (2008) Quantitative data cleaning for large databases. White Paper, United Nations Economic Commission for Europe
Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126
Horvitz E (1999) Principles of mixed-initiative user interfaces. In: ACM CHI, pp 159–166
Huynh D, Mazzocchi S (2010) Google refine. http://code.google.com/p/google-refine/
Kang H, Getoor L, Shneiderman B, Bilgic M, Licamele L (2008) Interactive entity resolution in relational data: a visual analytic tool and its evaluation. IEEE TVCG 14(5):999–1014
Kandel S, Heer J, Plaisant C, Kennedy J, van Ham F, Riche NH, Weaver C, Lee B, Brodbeck D, Buono P (2011a) Research directions in data wrangling: visualizations and transformations for usable and credible data. Inf Vis J 10(4):271–288
Kandel S, Paepcke A, Hellerstein J, Heer J (2011b) Wrangler: interactive visual specification of data transformation scripts. In: ACM human factors in computing systems (CHI)
Kandel S, Paepcke A, Hellerstein J, Heer J (2012a) Enterprise data analysis and visualization: an interview study. In: IEEE visual analytics science & technology (VAST)
Kandel S, Parikh R, Paepcke A, Hellerstein J, Heer J (2012b) Profiler: integrated statistical analysis and visualization for data quality assessment. In: Advanced visual interfaces
Lakshmanan LVS, Sadri F, Subramanian SN (2001) SchemaSQL: an extension to SQL for multidatabase interoperability. ACM Trans Database Syst 26(4): 476–519
Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10:334–350
Raman V, Hellerstein JM (2001) Potter’s wheel: an interactive data cleaning system. In: VLDB, pp 381–390
Robertson GG, Czerwinski MP, Churchill JE (2005) Visualization of mappings between schemas. In: ACM CHI, pp 431–439
Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In: ACM SIGKDD
Stonebraker M, Bruckner D, Ilyas IF, Beskales G, Cherniack M, Zdonik SB, Pagan A, Xu S (2013) Data curation at scale: the data tamer system. In: CIDR
Wickham H (2014) Tidy data. J Stat Softw 59(10):1–23
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this entry
Cite this entry
Heer, J., Hellerstein, J.M., Kandel, S. (2019). Data Wrangling. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-77525-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77524-1
Online ISBN: 978-3-319-77525-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering