Skip to main content

A Latent Space Analysis of Editor Lifecycles in Wikipedia

  • Conference paper
  • First Online:
Big Data Analytics in the Social and Ubiquitous Context (SENSEML 2015, MUSE 2014, MSM 2014)

Abstract

Collaborations such as Wikipedia are a key part of the value of the modern Internet. At the same time there is concern that these collaborations are threatened by high levels of member withdrawal. In this paper we borrow ideas from topic analysis to study editor activity on Wikipedia over time using latent space analysis, which offers an insight into the evolving patterns of editor behaviour. This latent space representation reveals a number of different categories of editor (e.g. Technical Experts, Social Networkers) and we show that it does provide a signal that predicts an editor’s departure from the community. We also show that long term editors generally have more diversified edit preference and experience relatively soft evolution in their editor profiles, while short term editors generally distribute their contribution at random among the namespaces and categories of articles and experience considerable fluctuation in the evolution of their editor profiles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Common work archetypes refer to the types of contribution that users make in online platforms, e.g. answering questions in Q&A sites and editing main pages in Wikipedia.

  2. 2.

    http://en.wikipedia.org/wiki/Wikipedia:Namespace.

  3. 3.

    At the time we collected data for this work, there was 22 macro/top-categories: http://en.wikipedia.org/wiki/Category:Main_topic_classifications.

  4. 4.

    http://dbpedia.org.

  5. 5.

    The category graph is a directed one due to the nature of category-subcategory structure.

  6. 6.

    http://strategy.wikimedia.org/wiki/Attracting_and_retaining_participants.

  7. 7.

    Available at: http://www.cs.princeton.edu/~blei/topicmodeling.html.

  8. 8.

    We experimented with different number of topics \(k \in \) [5, 45] with steps of 5 on the quarterly dataset using Non-negative Matrix Factorization (NMF) clustering, and then employed the mean pairwise normalized mutual information (NPMI) and mean pairwise Jaccard similarity (MPJ) as suggested by [11] to assess the coherence and generality of the topics for different ks. To cluster the quarterly data matrix efficiently, we used the fast alternating least squares variant of NMF introduced by Lin [10]. To produce deterministic results and avoid a poor local minimum, we used the Non-negative Double Singular Value Decomposition (NNDSVD) strategy [5] to choose initial factors for NMF. We found that overall, the run with 10 topics generates more coherent and general topics, and thus provides more interpretable and expressiveness results in terms of interpretation and overlapping between different topics.

  9. 9.

    In Wikipedia, bots are generally programs or scripts that make repetitive automated or semi-automated edits without the necessity of human decision-making: http://en.wikipedia.org/wiki/Wikipedia:Bot_policy.

  10. 10.

    The implementation of the test for R and Python envoriment can refer to: http://jpktd.blogspot.ie/2013/03/multiple-comparison-and-tukey-hsd-or_25.html.

References

  1. Ahmed, A., Low, Y., Aly, M., Josifovski, V., Smola, A.J.: Scalable distributed inference of dynamic user interests for behavioral targeting. In: Proceedings of KDD, pp. 114–122. ACM (2011)

    Google Scholar 

  2. Ahmed, A., Xing, E.P.: Timeline: a dynamic hierarchical dirichlet process model for recovering birth/death and evolution of topics in text stream. In: Proceedings of UAI, pp. 20–29 (2010)

    Google Scholar 

  3. Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of ICML, pp. 113–120 (2006)

    Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for non-negative matrix factorization. In: Pattern Recognition (2008)

    Google Scholar 

  6. Chan, J., Hayes, C., Daly, E.M.: Decomposing discussion forums using user roles. In: Proceedings of ICWSM, pp. 215–218 (2010)

    Google Scholar 

  7. Danescu-Niculescu-Mizil, C., West, R., Jurafsky, D., Leskovec, J., Potts, C.: No country for old members: user lifecycle and linguistic change in online communities. In: Proceedings of WWW, pp. 307–318. Rio de Janeiro, Brazil (2013)

    Google Scholar 

  8. Furtado, A., Andrade, N., Oliveira, N., Brasileiro, F.: Contributor profiles, their dynamics, and their importance in five Q&A sites. In: Proceedings of CSCW, pp. 1237–1252, Texas (2013)

    Google Scholar 

  9. Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using dbpedia. In: Proceedings of WSDM, pp. 465–474. ACM (2013)

    Google Scholar 

  10. Lin, C.: Projected gradient methods for non-negative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  11. O’Callaghan, D., Greene, D., Carthy, J., Cunningham, P.: An analysis of the coherence of descriptors in topic modeling. Expert Syst. Appl. 42(13), 5645–5657 (2015)

    Article  Google Scholar 

  12. Panciera, K., Halfaker, A., Terveen, L.: Wikipedians are born, not made: a study of power editors on wikipedia. In: Proceedings of GROUP, pp. 51–60. ACM (2009)

    Google Scholar 

  13. Jin, Y., Zhang, S., Zhao, Y., Chen, H., Sun, J., Zhang, Y., Chen, C.: Mining and information integration practice for chinese bibliographic database of life sciences. In: Perner, P. (ed.) ICDM 2013. LNCS, vol. 7987, pp. 1–10. Springer, Heidelberg (2013)

    Google Scholar 

  14. Tukey, J.W.: Comparing individual means in the analysis of variance. Biometrics 5(2), 99–114 (1949)

    Article  MathSciNet  Google Scholar 

  15. Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of KDD, pp. 424–433. ACM (2006)

    Google Scholar 

  16. Weia, C.P., Chiub, I.T.: Turning telecommunications call details to churn prediction: a data mining approach. Expert Syst. Appl. 23(2), 103–112 (2002)

    Article  Google Scholar 

  17. Welser, H.T., Cosley, D., Kossinets, G., Lin, A., Dokshin, F., Gay, G., Smith, M.: Finding social roles in wikipedia. In: Proceedings of iConference, pp. 122–129. ACM (2011)

    Google Scholar 

Download references

Acknowledgements

This work is supported by Science Foundation Ireland (SFI) under Grant No. SFI/12/RC/2289 (Insight Centre for Data Analytics). Xiangju Qin is funded by University College Dublin and China Scholarship Council (UCD-CSC Joint Scholarship 2011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pádraig Cunningham .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Qin, X., Greene, D., Cunningham, P. (2016). A Latent Space Analysis of Editor Lifecycles in Wikipedia. In: Atzmueller, M., Chin, A., Janssen, F., Schweizer, I., Trattner, C. (eds) Big Data Analytics in the Social and Ubiquitous Context. SENSEML MUSE MSM 2015 2014 2014. Lecture Notes in Computer Science(), vol 9546. Springer, Cham. https://doi.org/10.1007/978-3-319-29009-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29009-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29008-9

  • Online ISBN: 978-3-319-29009-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics