Assessing Web Article Quality by Harnessing Collective Intelligence

Han, Jingyu; Chen, Xueping; Chen, Kejia; Jiang, Dawei

doi:10.1007/978-3-642-29038-1_31

Jingyu Han²²,
Xueping Chen²²,
Kejia Chen²² &
…
Dawei Jiang²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7238))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1607 Accesses

Abstract

Existing approaches assess web article’s quality mainly based on syntax, but seldom work is given on how to quantify its quality based on semantics. In this paper we propose a novel Semantic Quality Assessment(SQA) approach to automatically determine data quality in terms of two most important quality dimensions, namely accuracy and completeness. First, alternative context with respect to source article is built by collecting alternative web articles. Second, each alternative article is transformed and represented by semantic corpus and dimension baselines are synthetically generated from these semantic corpora. Finally, quality dimension of source article is determined by comparing its semantic corpus with dimension baseline. Our approach is promising way to assess web article quality by exploiting available collective knowledge. Experiments show that our approach performs well.

The work is supported by National Natural Science Foundation of China under grants No. 61003040, 61100135, 60970030, 60903181 and 40801149.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dalip, D.H., Cristo, M., Calado, P.: Automatic quality assessment of content created collaboratively by web communities: A case study of wikipedia. In: Proc. of JCDL 2009, pp. 295–304 (2009)
Google Scholar
Rassbach, L., Pincock, T., Mingus, B.: Exploring the feasibility of automatically rating online article quality (2008)
Google Scholar
Stvilia, B., Twidle, M.B., Smith, L.C.: Assessing information quality of a community-based encyclopedia. In: Proc. of the International Conference on Information Quality, pp. 442–454 (2005)
Google Scholar
Wang, R.Y., Kon, H.B., Madnick, S.E.: Data quality requirements analysis and modelling. In: Proc. of the 9th ICDE, pp. 670–677 (1993)
Google Scholar
Aebi, D., Perrochon, L.: Towards improving data quality. In: Proc. of the International Conference on Information Systems and Management of Data, pp. 273–281 (1993)
Google Scholar
Bouzeghoub, M., Peralta, V.: A framework for analysis of data freshness. In: Proc. of IQIS 2004, pp. 59–67 (2004)
Google Scholar
Macdonald, N., Frase, L., Gingrich, P., Keenan, S.: The writer’s workbench: computer aids for text analysis. IEEE Transactions on Communications 30(1), 105–110 (1982)
Article Google Scholar
Foltz, P.W.: Supporting content-based feedback in on-line writing evaluation with lsa. Interactive Learning Environments 8(2), 111–127 (2000)
Article Google Scholar
Hu, M., Lim, E.P., Sun, A., Lauw, H.W., Vuong, B.Q.: Measuring article quality in wikipedia: Models and evaluation. In: Proc. of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 243–252 (2007)
Google Scholar
Zeng, H., Alhossaini, M.A., Ding, L.: Computing trust from revision history. In: Proc. of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services (2006)
Google Scholar
Zeng, H., Alhossaini, M.A., Fikes, R.: Mining revision history to assess trustworthiness of article fragments. In: Proc. of International conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 1–10 (2009)
Google Scholar
Li, X., Meng, W., Yu, C.: T-verifier: Verifying truthfulness of fact statements. In: Proc. of ICDE 2011, pp. 63–74 (2011)
Google Scholar
Parameswaran, A., Rajaraman, A., Garcia-Molina, H.: Towards the web of concepts: Extracting concepts from large datasets. In: Proc. of 2010 VLDB, vol. 3, pp. 566–577 (2010)
Google Scholar
Zhao, S., Betz, J.: Corroborate and learn facts from the web. In: Proc. of KDD 2007, pp. 995–1003 (2007)
Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Hofmann, T.: Probabilistic latent semantic analysis. In: Proc. of UAI 1999 (1999)
Google Scholar
Kukich, K.: Technique for automatically correcting words in text. ACM Computing Surveys 24(4), 377–439 (1992)
Article Google Scholar
Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, 210003, P.R.China
Jingyu Han, Xueping Chen & Kejia Chen
School of Computing, National University of Singapore, Singapore, 119077
Dawei Jiang

Authors

Jingyu Han
View author publications
You can also search for this author in PubMed Google Scholar
Xueping Chen
View author publications
You can also search for this author in PubMed Google Scholar
Kejia Chen
View author publications
You can also search for this author in PubMed Google Scholar
Dawei Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, Seoul National University, Gwanak-ro, Gwanak-gu, 151747, Seoul, South Korea
Sang-goo Lee
Computer School, Wuhan University, Luo-jia-shan, Wuchang, 430081, Wuhan, Hubei Province, China
Zhiyong Peng
School of Information Technology and Electrical Engineering, University of Queensland, QLD 4072, Brisbane, Australia
Xiaofang Zhou
Department of Computer Science, Kangwon National University, 192-1, Hyoja2-Dong, Chuncheon, 200701, Kangwon, South Korea
Yang-Sae Moon
Institute for Computer Science and Business Information, University of Duisburg-Essen, Schützenbahn 70, 45117, Essen, Germany
Rainer Unland
School of Information and Communication Engineering, Chungbuk National University, 52 Naesudong-ro, Heungdeok-gu, Cheongju, 4072, Chungbuk, South Korea
Jaesoo Yoo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, J., Chen, X., Chen, K., Jiang, D. (2012). Assessing Web Article Quality by Harnessing Collective Intelligence. In: Lee, Sg., Peng, Z., Zhou, X., Moon, YS., Unland, R., Yoo, J. (eds) Database Systems for Advanced Applications. DASFAA 2012. Lecture Notes in Computer Science, vol 7238. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29038-1_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-29038-1_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29037-4
Online ISBN: 978-3-642-29038-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics