Skip to main content

Words Worth Attention: Predicting Words of the Week on the Russian Wiktionary

  • Conference paper
  • 619 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 468))

Abstract

Such collaborative lexicography projects as Wiktionary are becoming strong competitors for traditional semantic resources just as Wikipedia has already become for expert-built knowledge bases. Keeping the data obtained from the general public crowd in good quality is a very challenging problem because of the fuzzy nature of the crowdsourcing phenomena. The presented study focuses on predicting the word of the week articles on the Russian Wiktionary by treating this problem as a binary classification task. The best proposed model is based on the Naïve Bayes classifier and has weighted average precision, recall, and F 1-measure values of 87% by evaluating on the provided dataset.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arazy, O., Nov, O.: Determinants of Wikipedia Quality: The Roles of Global and Local Contribution Inequality. In: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, pp. 233–236. ACM (2010)

    Google Scholar 

  2. van Assem, M., Malaisé, V., Miles, A., Schreiber, G.: A Method to Convert Thesauri to SKOS. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 95–109. Springer, Heidelberg (2006)

    Google Scholar 

  3. Blumenstock, J.E.: Size Matters: Word Count as a Measure of Quality on Wikipedia. In: Proceedings of the 17th International Conference on World Wide Web, pp. 1095–1096. ACM (2008)

    Google Scholar 

  4. De la Calzada, G., Dekhtyar, A.: On Measuring the Quality of Wikipedia Articles. In: Proceedings of the 4th Workshop on Information Credibility, pp. 11–18. ACM (2010)

    Google Scholar 

  5. Dalip, D.H., Gonçalves, A.M., Cristo, M., Calado, P.: Automatic Quality Assessment of Content Created Collaboratively by Web Communities: A Case Study of Wikipedia. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 295–304. ACM (2009)

    Google Scholar 

  6. Hu, M., Lim, E.P., Sun, A., Lauw, H.W., Vuong, B.Q.: Measuring Article Quality in Wikipedia: Models and Evaluation. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 243–252. ACM (2007)

    Google Scholar 

  7. Kittur, A., Kraut, R.E.: Harnessing the Wisdom of Crowds in Wikipedia: Quality Through Coordination. In: Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work, pp. 37–46. ACM (2008)

    Google Scholar 

  8. Krizhanovsky, A., Smirnov, A.: An approach to automated construction of a general-purpose lexical ontology based on Wiktionary. Journal of Computer and Systems Sciences International 52(2), 215–225 (2013)

    Article  Google Scholar 

  9. Lyashevskaya, O., Sharov, S.: The frequency dictionary of modern Russian language. Azbukovnik, Moscow (2009)

    Google Scholar 

  10. Meyer, C.M., Gurevych, I.: Wiktionary: A new rival for expert-built lexicons? Exploring the possibilities of collaborative lexicography. Electronic Lexicography, 259–291 (2012)

    Google Scholar 

  11. Saengthongpattana, K., Soonthornphisaj, N.: Assessing the Quality of Thai Wikipedia Articles Using Concept and Statistical Features. In: Rocha, Á., Correia, A.M., Tan, F., Stroetmann, K. (eds.) New Perspectives in Information Systems and Technologies, Volume 1. AISC, vol. 275, pp. 513–523. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  12. Stvilia, B., Twidale, M.B., Smith, L.C., Gasser, L.: Information Quality Work Organization in Wikipedia. Journal of the American Society for Information Science and Technology 59(6), 983–1001 (2008)

    Article  Google Scholar 

  13. Wilkinson, D.M., Huberman, B.A.: Cooperation and Quality in Wikipedia. In: Proceedings of the 2007 International Symposium on Wikis, pp. 157–164. ACM (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ustalov, D. (2014). Words Worth Attention: Predicting Words of the Week on the Russian Wiktionary. In: Klinov, P., Mouromtsev, D. (eds) Knowledge Engineering and the Semantic Web. KESW 2014. Communications in Computer and Information Science, vol 468. Springer, Cham. https://doi.org/10.1007/978-3-319-11716-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11716-4_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11715-7

  • Online ISBN: 978-3-319-11716-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics