CrowdCleaner: A Data Cleaning System Based on Crowdsourcing

Ye, Chen; Wang, Hongzhi; Li, Keli; Chen, Qian; Chen, Jianhua; Song, Jiangduo; Yuan, Weidong

doi:10.1007/978-3-319-11116-2_64

Chen Ye¹⁹,
Hongzhi Wang¹⁹,
Keli Li¹⁹,
Qian Chen¹⁹,
Jianhua Chen¹⁹,
Jiangduo Song¹⁹ &
…
Weidong Yuan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8709))

Included in the following conference series:

Asia-Pacific Web Conference

3339 Accesses

Abstract

As data in real life is often dirty, data cleaning is a natural way to improve the data quality. However, due to the lack of human knowledge, existing automatic data cleaning systems cannot find the proper values for dirty data. Thus we propose an online data cleaning system CrowdCleaner based on Crowdsourcing. CrowdCleaner provides a friendly interface for users dealing with different data quality problems. In this demonstration, we show the architecture of CrowdCleaner and highlight a few of its key features. We will show the process of the CrowdCleaner to clean data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Crowdsourcing for data management

Article 05 May 2017

HITCleaner: A Light-Weight Online Data Cleaning System

Crowdsourcing: Descriptive Study on Algorithms and Frameworks for Prediction

Article 04 April 2021

References

Howe, J.: The rise of crowdsourcing. Wired Magazine 14(6), 1–4 (2006)
MathSciNet Google Scholar
Jin, L., Wang, H., Gao, H.: Imputation for categorical attributes with probabilistic reasoning. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds.) WAIM 2013. LNCS, vol. 7923, pp. 87–98. Springer, Heidelberg (2013)
Chapter Google Scholar
Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.-A.: Declarative data cleaning: Language, model, and algorithms. In: VLDB, pp. 371–380 (2001)
Google Scholar
Raman, V., Hellerstein, J.M.: Potter’s wheel: An interactive data cleaning system. In: VLDB, pp. 381–390 (2001)
Google Scholar
Redman, T.C.: Data: An unfolding quality disaster. Information Management Magazine (August 2004)
Google Scholar
Shilakes, C., Tylman, J.: Enterprise information portals. Merrill Lynch (1998)
Google Scholar
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1), 5 (2007)
Article Google Scholar
Ye, C., Wang, H., Gao, H., Li, J., Xie, H.: Truth discovery based on crowdsourcing. In: Li, F., Li, G., Hwang, S.-w., Yao, B., Zhang, Z. (eds.) WAIM 2014. LNCS, vol. 8485, pp. 453–458. Springer, Heidelberg (2014)
Chapter Google Scholar
Fan, W.: Dependencies revisited for improving data quality. In: PODS, pp. 159–170 (2008)
Google Scholar
Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving Data Quality: Consistency and Accuracy. In: VLDB 2007, pp. 315–326 (2007)
Google Scholar
Liu, S., Liu, Y., Ni, L.M., et al.: Towards mobility-based clustering. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 919–928. ACM (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Harbin Institute of Technology, Harbin, 150000, China
Chen Ye, Hongzhi Wang, Keli Li, Qian Chen, Jianhua Chen, Jiangduo Song & Weidong Yuan

Authors

Chen Ye
View author publications
You can also search for this author in PubMed Google Scholar
Hongzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Keli Li
View author publications
You can also search for this author in PubMed Google Scholar
Qian Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiangduo Song
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Beijing Institute of Spacecraft System Engineering, Beijing, China
Lei Chen
School of Computer Science, National University of Defense Technology, 410073, Changsha, Hunan, China
Yan Jia
RMIT University, Melbourne, Australia
Timos Sellis
School of Computer Science and Technology, Soochow University, 215006, Suzhou, China
Guanfeng Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ye, C. et al. (2014). CrowdCleaner: A Data Cleaning System Based on Crowdsourcing. In: Chen, L., Jia, Y., Sellis, T., Liu, G. (eds) Web Technologies and Applications. APWeb 2014. Lecture Notes in Computer Science, vol 8709. Springer, Cham. https://doi.org/10.1007/978-3-319-11116-2_64

Download citation

DOI: https://doi.org/10.1007/978-3-319-11116-2_64
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11115-5
Online ISBN: 978-3-319-11116-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CrowdCleaner: A Data Cleaning System Based on Crowdsourcing

Abstract

Access this chapter

Preview

Similar content being viewed by others

Crowdsourcing for data management

HITCleaner: A Light-Weight Online Data Cleaning System

Crowdsourcing: Descriptive Study on Algorithms and Frameworks for Prediction

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

CrowdCleaner: A Data Cleaning System Based on Crowdsourcing

Abstract

Access this chapter

Preview

Similar content being viewed by others

Crowdsourcing for data management

HITCleaner: A Light-Weight Online Data Cleaning System

Crowdsourcing: Descriptive Study on Algorithms and Frameworks for Prediction

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation