Women’s Forums on the Dark Web
With the recent advent of Web 2.0, more and more women participate in and exchange opinions through community-based social media on the Internet. Questions concerning gender differences in the context of online communication have been raised. In this study, we develop a feature-based text classification framework to examine the online gender differences between female and male posters on web forums by analyzing writing styles and topics of interests. We examine the performance of different feature sets in an experiment involving political opinions. The results of our experimental study on this Islamic women’s political forum show that the feature sets containing both content-free and content-specific features perform significantly better than those consisting of only content-free features. In addition, feature subset selection can improve the classification results significantly. Female and male participants were found to have significantly different topics of interest in our study.
KeywordsFunction Word Gender Classification Online Review Syntactic Feature Sentiment Classification
This material is based upon work supported by the National Science Foundation under Grant No. CNS-0709338, “(CRI: CRD) Developing a Dark Web Collection and Infrastructure for Computational and Social Sciences.” We would also like to thank Dr. Katharina von Knop for her helpful suggestions and comments about our research test bed.
- Argamon, S., M. Koppel, and G. Avneri, “Routing documents according to style.,” in Proceedings of Proceedings of the 1st International Workshop on Innovative Information, Pisa, Italy, 1988.Google Scholar
- Argamon, S., M. Saric, and S.S. Stein, “Style mining of electronic messages for multiple authorship discrimination,” in Proceedings of Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003b, pp. 475–480.Google Scholar
- Baayen, R.H., H.V. Halteren, A. Neijt, and F.J. Tweedie, “An experiment in authorship attribution,” in Proceedings of Proceedings of the 6th International Conference on Statistical Analysis of Textual Data, 2002, pp. 69–75.Google Scholar
- Bimber, B., “Measuring the gender gap on the Internet,” Social Science Quarterly, vol. 81, no. 3, 2000, pp. 868–876.Google Scholar
- CommerceNet, “The CommerceNet/Nielsen Internet demographic survey (1999),” http://www.commerce.net/, 1999.
- Consaluo, M. and S. Paasonen, Women and Everyday Uses of the Internet: Agency and Identity, New York: Peter Lang Publishing, 2002.Google Scholar
- Corney, M., O. de Vel, A. Anderson, and G. Mohay, “Gender-preferential text mining of e-mail discourse,” in Proceedings of Proceedings of the 18th Annual Computer Security Applications Conference (ACSAC 2002), Las Vegas, 2002, pp. 282–292.Google Scholar
- Dave, K., S. Lawrence, and D. Pennock, “Mining the peanut gallery: opinion extraction and semantic classification of product reviews,” in Proceedings of Proceedings of the 12th International World Wide Web Conference (WWW’03), 2003, pp. 519–528.Google Scholar
- de Vel, O., “Mining E-mail Authorship,” in Proceedings of Paper presented at the Workshop on Text Mining, ACM International Conference on Knowledge Discovery and Data Mining (KDD 2000), Boston, MA, 2000.Google Scholar
- Gamon, M., “Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis,” in Proceedings of Proceedings of the 20th International Conference on Computational Linguistics, 2004, pp. 841–847.Google Scholar
- Grefenstette, G., Y. Qu, J.G. Shanahan, and D.A. Evans, “Coupling niche browsers and affect analysis for an opinion mining application,” in Proceedings of Proceedings of the 12th International Conference Recherche d’Information Assistee par Ordinateur, 2004, pp. 186–194.Google Scholar
- Hota, S., S. Argamon, M. Koppel, and I. Zigdon, “Performing gender: automatic stylistic analysis of Shakespeare’s characters,” in Proceedings of Proceedings of the Digital Humanities Conference (Association for Computers in Humanities and the Association for Literary and Linguistic Computing), 2006, pp. 100–106.Google Scholar
- Hu, M. and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of Proceedings of the ACM SIGKDD International Conference, 2004, pp. 168–177.Google Scholar
- Koppel, M. and J. Schler, “Exploiting stylistic idiosyncrasies for authorship attribution,” in Proceedings of Proceedings of the IJCAIWorkshop on Computational Approaches to Style Analysis and Synthesis, Acapulco, Mexico, 2003.Google Scholar
- Mishne, G., “Experiments with mood classification,” in Proceedings of Proceedings of the 1st Workshop on Stylistic Analysis of Text for Information Access, Salvador, Brazil, 2005.Google Scholar
- Mosteller, F., Applied Bayesian and Classical Inference: The Case of the Federalist Papers, 2nd ed., Springer, 1964.Google Scholar
- National Election Study, “American National Election Study. 1998 Pre- and post- election survey,” Conducted by the Center for Political Studies of the Institute for Social Research, The University of Michigan, Ann Arbor, Inter-University Consortium for Political and Social Research, 1998.Google Scholar
- Nowson, S. and J. Oberlander, “The identity of bloggers: openness and gender in personal Weblogs,” in Proceedings of Proceedings of the AAAI Spring Symposia on Computational Approaches to Analyzing Weblogs, Stanford, California, 2006.Google Scholar
- O’Reilly, T. “What Is Web 2.0? Design patterns and business models for the next generation of software,” http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-Web-20.html, 2005.
- Pang, B., L. Lee, and S. Vaithyanathain, “Thumbs up? Sentiment classification using machine learning techniques,” in Proceedings of Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2002, pp. 79–86.Google Scholar
- Peng, F., D. Schuurmans, V. Keselj, and S. Wang, “Automated authorship attribution with character level language models,” in Proceedings of Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, 2003.Google Scholar
- Pew Internet and American Life Project, http://www.pewinternet.org/trends/User_Demo_7.22.08.htm, 2008.
- Platt, J. Fast Training on SVMs Using Sequential Minimal Optimization, In Scholkopf, B., Burges, C., and Smola, A. (Ed.) ed., Advances in Kernel Methods: Support Vector Learning, Cambridge, MA: MIT Press, 1999.Google Scholar
- Quinlan, J.R., “Induction of decision trees,” Machine Learning, vol. 1, no. 1, 1986, pp. 81–106.Google Scholar
- Schler, J., M. Koppel, S. Argamon, and J. Pennebaker, “Effects of age and gender on blogging,” in Proceedings of Proceedings of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, Menlo Park, California, 2006, pp. 199–205.Google Scholar
- Shade, L.R., Gender and Community in the Social Construction of the InternetGender and Community in the Social Construction of the Internet, New York: Peter Lang Publishing, 2002.Google Scholar
- Sherman, A.P., Cybergrrl @ Work: Tips and Inspiration for the Professional You, Berkley Trade, 2001.Google Scholar
- Turney, P.D., “Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews,” in Proceedings of Proceedings of the 40th Annual Meetings of the Association for Computational Linguistics, Philadelphia, Pennsylvania, 2002, pp. 417–424.Google Scholar
- Wiebe, J., T. Wilson, and M. Bell, “Identifying collocations for recognizing opinions,” in Proceedings of Proceedings of the ACL/EACL Workshop on Collocation, Toulouse, France, 2001.Google Scholar
- Yang, Y. and J.O. Pedersen, “A comparative study on feature selection in text categorization,” in Proceedings of Proceedings of the ICML97, 1997, pp. 412–420.Google Scholar
- Yule, G.U., “On sentence length as a statistical characteristic of style in prose with application to two cases of disputed authorship,” Biometrika, vol. 30, 1938, pp. 363–390.Google Scholar
- Yule, G.U., The Statistical Study of Literary Vocabulary, Cambridge University Press, 1944.Google Scholar