Abstract
Our thesis is that members of the same group have shared tendencies and nuances in communication style and substance, particularly online. In this paper, we dicuss some potential applications of accuarate authorship affiliation technology. We also discuss related work in similar author identification efforts and the research issues that currently exist when trying to perform automated authorship affiliation. We provide quantitative results from our recent Machine Learning experimenation using Support Vector Machines as some initial validation of our theory. In this paper, we applied our work towards the task of classifying website forum posts by the affiliation of their author. We discuss in detail the stylometric features we used to perform the automated classification and split the original features into individual groups to isolate their respective contributions and/or discriminating capability. Our results show promise towards automating group representation, an important first step in studying group formation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Giles, H., Taylor, D., Bourhis, R.: Towards a theory of interpersonal accomodation through language. In: Language in Society, vol. 2, pp. 177–192. Cambridge University Press (1973)
Postmes, T., Spears, R., Lea, M.: The Formation of Group Norms in Computer-Mediated Communication. In: Human Communication Research, vol. 26, pp. 341–371. Sage Publications (2000)
Ceruti, M.G., McGirr, S.C., Kaina, J.L.: Interaction of Language, Culture and Cognition in Group Dynamics for Understanding the Adversary. In: Proceedings of the National Symposium on Sensor and Data Fusion (NSSDF, Nellis AFB, Las Vegas, NV (2010)
Holmes, D.I.: Authorship Attribution. Computers and the Humanities 28(2), 87–106 (1994)
Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology 57(3), 378–393 (2006)
Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology, 538–556 (2009)
Abbasi, A., Chen, H.: Applying Authorship Analysis to Extremist-Group Web Forum Messages. IEEE Intelligent Systems, 67–75 (2005)
Juola, P.: Authorship attribution. Foundations and Trends in Information Retrieval 1(3), 233–334 (2006)
Booker, L., Strong, G.: Using Topic Analysis to Compute Identity Group Attributes. In: Social Computing, Behavioral Modeling, and Prediction, pp. 249–258 (2008)
Koppel, M., Argamon, S., Shimoni, A.: Automatically Categorizing Written Texts by Author Gender. Literary and Linguistic Computing 17(3) (2002)
Izumi, M., Miura, T., Shioya, I.: Estimating the date of blog authors by CRF. In: IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 249–252 (2007)
Goswami, S., Sarkar, S., Rustagi, M.: Stylometric Analysis of Bloggers’ Age and Gender. In: Proceedings of the AAAI International Conference on Weblogs and Social Media (2009)
Koppel, M., Schler, J., Zigdon, K.: Determining an author’s native language by mining a text for errors. In: Proceedings of Knowledge Discovery in Data Mining, pp. 624–628 (2005)
Argamon, S., Saric, M., Stein, S.S.: Style mining of electronic messages for multiple authorship discrimination: first results. In: Proceedings of Knowledge Discovery in Data Mining, pp. 475–480 (2003)
Ratnaparkhi, A.: A Maximum Entropy Model for Part-Of-Speech Tagging. In: Proceedings of the Emperical Methods in Natural Language Processing, pp. 133–142 (1996)
Khosmood, F., Levinson, R.: Automatic Synonym and Phrase Replacement Show Promise for Style Transformation. In: Proceedings of the IEEE Ninth International Conference on Machine Learning and Applications, pp. 958–961 (2010)
Lin, W.H., Wilson, T., Wiebe, J., Hauptmann, A.: Which side are you on? Identifying perspectives at the document and sentence levels. In: Proceedings of the Tenth Conference on Natural Language Learning, pp. 109–116 (2006)
Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 405, 442–451 (1975)
Burrows, J.F.: Word patterns and story shapes: The statistical analysis of narrative style. Literary and Linguistic Computing 2, 61–70 (1987)
Stamatatos, E., Fakotakis, N., Kokkinakis, G.K.: Automatic Text Categorization in Terms of Genre, Author. Computational Linguist 26(4), 471–495 (2000)
Argamon-Engelson, S., Koppel, M., Avneri, G.: Style-based text categorization: What newspaper am I reading? In: Proceedings of AAAI Workshop on Learning for Text Categorization, pp. 1–4 (1998)
Ellen, J., Parameswaran, S.: Machine Learning for Author Affiliation within Web Forums. In: Proceedings of the IEEE Tenth International Conference on Machine Learning, pp. 100–106 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ellen, J., Kaina, J., Parameswaran, S. (2012). Implicit Group Membership Detection in Online Text: Analysis and Applications. In: Yang, S.J., Greenberg, A.M., Endsley, M. (eds) Social Computing, Behavioral - Cultural Modeling and Prediction. SBP 2012. Lecture Notes in Computer Science, vol 7227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29047-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-29047-3_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29046-6
Online ISBN: 978-3-642-29047-3
eBook Packages: Computer ScienceComputer Science (R0)