Skip to main content

Implicit Group Membership Detection in Online Text: Analysis and Applications

  • Conference paper
Social Computing, Behavioral - Cultural Modeling and Prediction (SBP 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7227))

Abstract

Our thesis is that members of the same group have shared tendencies and nuances in communication style and substance, particularly online. In this paper, we dicuss some potential applications of accuarate authorship affiliation technology. We also discuss related work in similar author identification efforts and the research issues that currently exist when trying to perform automated authorship affiliation. We provide quantitative results from our recent Machine Learning experimenation using Support Vector Machines as some initial validation of our theory. In this paper, we applied our work towards the task of classifying website forum posts by the affiliation of their author. We discuss in detail the stylometric features we used to perform the automated classification and split the original features into individual groups to isolate their respective contributions and/or discriminating capability. Our results show promise towards automating group representation, an important first step in studying group formation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Giles, H., Taylor, D., Bourhis, R.: Towards a theory of interpersonal accomodation through language. In: Language in Society, vol. 2, pp. 177–192. Cambridge University Press (1973)

    Google Scholar 

  2. Postmes, T., Spears, R., Lea, M.: The Formation of Group Norms in Computer-Mediated Communication. In: Human Communication Research, vol. 26, pp. 341–371. Sage Publications (2000)

    Google Scholar 

  3. Ceruti, M.G., McGirr, S.C., Kaina, J.L.: Interaction of Language, Culture and Cognition in Group Dynamics for Understanding the Adversary. In: Proceedings of the National Symposium on Sensor and Data Fusion (NSSDF, Nellis AFB, Las Vegas, NV (2010)

    Google Scholar 

  4. Holmes, D.I.: Authorship Attribution. Computers and the Humanities 28(2), 87–106 (1994)

    Article  Google Scholar 

  5. Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology 57(3), 378–393 (2006)

    Article  Google Scholar 

  6. Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology, 538–556 (2009)

    Google Scholar 

  7. Abbasi, A., Chen, H.: Applying Authorship Analysis to Extremist-Group Web Forum Messages. IEEE Intelligent Systems, 67–75 (2005)

    Google Scholar 

  8. Juola, P.: Authorship attribution. Foundations and Trends in Information Retrieval 1(3), 233–334 (2006)

    Article  Google Scholar 

  9. Booker, L., Strong, G.: Using Topic Analysis to Compute Identity Group Attributes. In: Social Computing, Behavioral Modeling, and Prediction, pp. 249–258 (2008)

    Google Scholar 

  10. Koppel, M., Argamon, S., Shimoni, A.: Automatically Categorizing Written Texts by Author Gender. Literary and Linguistic Computing 17(3) (2002)

    Google Scholar 

  11. Izumi, M., Miura, T., Shioya, I.: Estimating the date of blog authors by CRF. In: IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 249–252 (2007)

    Google Scholar 

  12. Goswami, S., Sarkar, S., Rustagi, M.: Stylometric Analysis of Bloggers’ Age and Gender. In: Proceedings of the AAAI International Conference on Weblogs and Social Media (2009)

    Google Scholar 

  13. Koppel, M., Schler, J., Zigdon, K.: Determining an author’s native language by mining a text for errors. In: Proceedings of Knowledge Discovery in Data Mining, pp. 624–628 (2005)

    Google Scholar 

  14. Argamon, S., Saric, M., Stein, S.S.: Style mining of electronic messages for multiple authorship discrimination: first results. In: Proceedings of Knowledge Discovery in Data Mining, pp. 475–480 (2003)

    Google Scholar 

  15. Ratnaparkhi, A.: A Maximum Entropy Model for Part-Of-Speech Tagging. In: Proceedings of the Emperical Methods in Natural Language Processing, pp. 133–142 (1996)

    Google Scholar 

  16. Khosmood, F., Levinson, R.: Automatic Synonym and Phrase Replacement Show Promise for Style Transformation. In: Proceedings of the IEEE Ninth International Conference on Machine Learning and Applications, pp. 958–961 (2010)

    Google Scholar 

  17. Lin, W.H., Wilson, T., Wiebe, J., Hauptmann, A.: Which side are you on? Identifying perspectives at the document and sentence levels. In: Proceedings of the Tenth Conference on Natural Language Learning, pp. 109–116 (2006)

    Google Scholar 

  18. Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 405, 442–451 (1975)

    Google Scholar 

  19. Burrows, J.F.: Word patterns and story shapes: The statistical analysis of narrative style. Literary and Linguistic Computing 2, 61–70 (1987)

    Article  Google Scholar 

  20. Stamatatos, E., Fakotakis, N., Kokkinakis, G.K.: Automatic Text Categorization in Terms of Genre, Author. Computational Linguist 26(4), 471–495 (2000)

    Article  Google Scholar 

  21. Argamon-Engelson, S., Koppel, M., Avneri, G.: Style-based text categorization: What newspaper am I reading? In: Proceedings of AAAI Workshop on Learning for Text Categorization, pp. 1–4 (1998)

    Google Scholar 

  22. Ellen, J., Parameswaran, S.: Machine Learning for Author Affiliation within Web Forums. In: Proceedings of the IEEE Tenth International Conference on Machine Learning, pp. 100–106 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ellen, J., Kaina, J., Parameswaran, S. (2012). Implicit Group Membership Detection in Online Text: Analysis and Applications. In: Yang, S.J., Greenberg, A.M., Endsley, M. (eds) Social Computing, Behavioral - Cultural Modeling and Prediction. SBP 2012. Lecture Notes in Computer Science, vol 7227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29047-3_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29047-3_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29046-6

  • Online ISBN: 978-3-642-29047-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics