Skip to main content

Structure-Based Features for Predicting the Quality of Articles in Wikipedia

  • Chapter
  • First Online:
Prediction and Inference from Social Networks and Social Media

Abstract

Success of Wikipedia is decidedly due to the free availability of high quality articles across many different expertise areas. If most of these resolute collaborations between authoritative users might constitute referenceable sources, Wikipedia is not sheltered from well-identified problems regarding articles quality, e.g., reputability of third-party sources and vandalism. Because of the huge number of articles and the intensive edit rate, it is not reasonable to even consider the manual evaluation of the content quality of each article. In this paper, we tackle the problem of modeling and predicting the quality of articles in collaborative platforms. We propose a quality model integrating both temporal and structural features captured from the implicit peer review process enabled by Wikipedia. A generic HITS-like framework is developed and able to capture both the quality of the content and the authority of the associated authors. Notably, a mutual reinforcement principle held between articles quality and author’s authority is exploited in order to take advantage of the collaborative graph generated by the users. Experiments conducted on a set of representative data from Wikipedia show the effectiveness of the computed indicators both in an unsupervised and supervised scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://en.wikipedia.org/wiki/Wikipedia:Statistics

  2. 2.

    As stated in Sect. 5.1 the Wikipedia Editorial Team Assessment has manually labelled 30K articles. This enables the possibility to build some statistics on these categories.

  3. 3.

    User preferences over the Wikipedia articles is given in Sect. 5.

  4. 4.

    https://en.wikipedia.org/wiki/Andreas_Hillgruber

  5. 5.

    https://en.wikipedia.org/wiki/Japanese_aircraft_carrier_Kaga

References

  1. Adler BT, de Alfaro L (2007) A content-driven reputation system for the wikipedia. In: Proceedings of the 16th international conference on world wide web (WWW ’07). ACM, New York, NY, pp 261–270

    Chapter  Google Scholar 

  2. Adler BT, Chatterjee K, de Alfaro L, Faella M, Pye I, Raman V (2008) Assigning trust to wikipedia content. In: Proceedings of the 4th international symposium on wikis (WikiSym ’08). ACM, New York, NY, pp 26:1–26:12

    Google Scholar 

  3. Biancani S (2014) Measuring the quality of edits to wikipedia. In: Proceedings of the international symposium on open collaboration (OpenSym ’14). ACM, New York, NY, pp 33:1–33:3

    Google Scholar 

  4. Blumenstock JE (2008) Size matters: word count as a measure of quality on wikipedia. In: Proceedings of the 17th international conference on world wide web (WWW ’08). ACM, New York, NY, pp 1095–1096

    Chapter  Google Scholar 

  5. Cox LP (2011) Truth in crowdsourcing. IEEE Secur Priv 9(5):74–76

    Article  Google Scholar 

  6. Dalip DH, Gonçalves MA, Cristo M, Calado P (2011) Automatic assessment of document quality in web collaborative digital libraries. J Data Inf Qual 2(3):14:1–14:30

    Google Scholar 

  7. De la Calzada G, Dekhtyar A (2010) On measuring the quality of wikipedia articles. In: Proceedings of the 4th workshop on information credibility (WICOW ’10). ACM, New York, NY, pp 11–18

    Chapter  Google Scholar 

  8. de La Robertie B, Pitarch Y, Teste O (2015) Measuring article quality in wikipedia using the collaboration network. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015 (ASONAM ’15). ACM, New York, NY, pp 464–471

    Chapter  Google Scholar 

  9. Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU Press, Baltimore

    MATH  Google Scholar 

  10. Hu M, Lim E-P, Sun A, Lauw HW, Vuong B-Q (2007) Measuring article quality in wikipedia: Models and evaluation. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management (CIKM ’07). ACM, New York, NY, pp 243–252

    Chapter  Google Scholar 

  11. Javanmardi S, Lopes C (2010) Statistical measure of quality in wikipedia. In: Proceedings of the first workshop on social media analytics (SOMA ’10). ACM, New York, NY, pp 132–138

    Chapter  Google Scholar 

  12. Li X, Tang J, Wang T, Luo Z, de Rijke M (2015) Automatically assessing wikipedia article quality by exploiting article-editor networks. In: ECIR 2015: 37th European conference on information retrieval. Springer, Berlin

    Google Scholar 

  13. Suzuki Y (2015) Quality assessment of wikipedia articles using < i > h < ∕i > -index. J Inf Process 23(1):22–30

    Google Scholar 

  14. Suzuki Y, Yoshikawa M (2013) Assessing quality score of wikipedia article using mutual evaluation of editors and texts. In: Proceedings of the 22Nd ACM international conference on conference on information &#38; knowledge management (CIKM ’13). ACM, New York, NY, pp 1727–1732

    Google Scholar 

  15. Wilkinson DM, Huberman BA (2007) Cooperation and quality in wikipedia. In: Proceedings of the 2007 international symposium on wikis (WikiSym ’07). ACM, New York, NY, pp 157–164

    Chapter  Google Scholar 

  16. Wöhner T, Peters R (2009) Assessing the quality of wikipedia articles with lifecycle based metrics. In: Proceedings of the 5th international symposium on wikis and open collaboration (WikiSym ’09). ACM, New York, NY, pp 16:1–16:10

    Google Scholar 

  17. Yining W, Liwei W, Yuanzhi L, Di H, Wei C, Tie-Yan L (2013) A theoretical analysis of NDCG ranking measures. In: Proceedings of the 26th annual conference on learning theory

    Google Scholar 

  18. Zeng H, Alhossaini M, Fikes R, McGuinness L (2006) Mining revision history to assess trustworthiness of article fragments. In: Proceedings of the 2nd international conference on collaborative computing: networking, applications and worksharing

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baptiste de La Robertie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

de La Robertie, B., Pitarch, Y., Teste, O. (2017). Structure-Based Features for Predicting the Quality of Articles in Wikipedia. In: Kawash, J., Agarwal, N., Özyer, T. (eds) Prediction and Inference from Social Networks and Social Media. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-51049-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-51049-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-51048-4

  • Online ISBN: 978-3-319-51049-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics