Abstract
With the recent advent of Web 2.0, more and more women participate in and exchange opinions through community-based social media on the Internet. Questions concerning gender differences in the context of online communication have been raised. In this study, we develop a feature-based text classification framework to examine the online gender differences between female and male posters on web forums by analyzing writing styles and topics of interests. We examine the performance of different feature sets in an experiment involving political opinions. The results of our experimental study on this Islamic women’s political forum show that the feature sets containing both content-free and content-specific features perform significantly better than those consisting of only content-free features. In addition, feature subset selection can improve the classification results significantly. Female and male participants were found to have significantly different topics of interest in our study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbasi, A. and H. Chen, “Applying authorship analysis to extremist-group Web forum messages,” IEEE Intelligent Systems, vol. 20, no. 5 (Special issue on artificial intelligence for national and homeland security), 2005, pp. 67–75.
Abbasi, A. and H. Chen, “Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace,” ACM Transactions on Information Systems, vol. 26, no. 2, 2008, pp. 1–29.
Abbasi, H. Chen, and J.F. Nunamaker, “Stylometric identification in electronic markets: scalability and robustness,” Journal of Management Information Systems, vol. 25, no. 1, 2008b, pp. 49–78.
Abbasi, A., H. Chen, and A. Salem, “Sentiment analysis in multiple languages: feature selection for opinion classification in Web forums,” ACM Transactions on Information Systems, vol. 26, no. 3, 2008a, pp. 1–34.
Argamon, S., M. Koppel, and G. Avneri, “Routing documents according to style.,” in Proceedings of Proceedings of the 1st International Workshop on Innovative Information, Pisa, Italy, 1988.
Argamon, S., M. Koppel, J. Fine, and A. Shimoni, “Gender, genre, and writing style in formal written texts,” Text, vol. 23, no. 3, 2003a, pp. 321–346.
Argamon, S., M. Saric, and S.S. Stein, “Style mining of electronic messages for multiple authorship discrimination,” in Proceedings of Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003b, pp. 475–480.
Baayen, R.H., H.V. Halteren, A. Neijt, and F.J. Tweedie, “An experiment in authorship attribution,” in Proceedings of Proceedings of the 6th International Conference on Statistical Analysis of Textual Data, 2002, pp. 69–75.
Baayen, R.H., H.V. Halteren, and F.J. Tweedie, “Outside the cave of shadows: using syntactic annotation to enhance authorship attribution,” Literary and Linguistic Computing, vol. 11, no. 3, 1996, pp. 121–132.
Bimber, B., “Measuring the gender gap on the Internet,” Social Science Quarterly, vol. 81, no. 3, 2000, pp. 868–876.
Burrows, J.F., “‘An ocean where each kind….’ Statistical analysis and some major determinants of literary style,” Computers and the Humanities, vol. 23, no. 4–5, 1989, pp. 309–321.
CommerceNet, “The CommerceNet/Nielsen Internet demographic survey (1999),” http://www.commerce.net/, 1999.
Consaluo, M. and S. Paasonen, Women and Everyday Uses of the Internet: Agency and Identity, New York: Peter Lang Publishing, 2002.
Corney, M., O. de Vel, A. Anderson, and G. Mohay, “Gender-preferential text mining of e-mail discourse,” in Proceedings of Proceedings of the 18th Annual Computer Security Applications Conference (ACSAC 2002), Las Vegas, 2002, pp. 282–292.
Dave, K., S. Lawrence, and D. Pennock, “Mining the peanut gallery: opinion extraction and semantic classification of product reviews,” in Proceedings of Proceedings of the 12th International World Wide Web Conference (WWW’03), 2003, pp. 519–528.
de Vel, O., “Mining E-mail Authorship,” in Proceedings of Paper presented at the Workshop on Text Mining, ACM International Conference on Knowledge Discovery and Data Mining (KDD 2000), Boston, MA, 2000.
de Vel, O., A. Anderson, M. Corney, and G. Mohay, “Mining e-mail content for author identification forensics,” SIGMOD Record, vol. 30, no. 4, 2001, pp. 55–64.
Diederich, J., J. Kindermann, E. Leopold, and G. Paass, “Authorship attribution with support vector machines,” Applied Intelligence, vol. 19, no. 1–2, 2003, pp. 109–123.
Forsyth, R.S., and D.I. Holmes, “Feature finding for text classification,” Literary and Linguistic Computing, vol. 11, no. 4, 1996, pp. 163–174.
Fountain, J.E., “Constructing the information society: women, information technology, and design,” Technology and Societyvol. 22, no. 1, 2000, pp. 45–62.
Fuller, J.E., “Equality in cyberdemocracy? Gauging gender gaps in on-line civic participation,” Social Science Quarterlyvol. 85, no. 4, 2004, pp. 938–957.
Gamon, M., “Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis,” in Proceedings of Proceedings of the 20th International Conference on Computational Linguistics, 2004, pp. 841–847.
Grefenstette, G., Y. Qu, J.G. Shanahan, and D.A. Evans, “Coupling niche browsers and affect analysis for an opinion mining application,” in Proceedings of Proceedings of the 12th International Conference Recherche d’Information Assistee par Ordinateur, 2004, pp. 186–194.
Guiller, J. and A. Durndell, “Students’ linguistic behaviour in online discussion groups: Does gender matter?” Computers in Human Behavior, vol. 23, no. 5, 2007, pp. 2240–55.
Guo, B. and M.S. Nixon, “Gait feature subset selection by mutual information,” IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans, vol. 39, no. 1, 2009, pp. 36–46.
Halbert, D. “Shulamith firestone: radical feminism and visions of the information society,” Information Communication and Society, vol. 7, no. 1, 2004, pp. 115–136.
Harcourt, W., “The personal and the political: women using the Internet,” Cyberpsychology and Behaviorvol. 3, no. 5, 2000, pp. 693–697.
Harp, D. and M. Tremayne, “The gendered blogosphere: examining inequality using network and feminist theory,” Journalism and Mass Communication Quarterlyvol. 83, no. 2, 2006, pp. 247–264.
Holmes, D.I. and R.S. Forsyth, “The federalist revisited: new directions in authorship attribution,” Literary and Linguistic Computing, vol. 10, no. 2, 1995, pp. 111–127.
Hota, S., S. Argamon, M. Koppel, and I. Zigdon, “Performing gender: automatic stylistic analysis of Shakespeare’s characters,” in Proceedings of Proceedings of the Digital Humanities Conference (Association for Computers in Humanities and the Association for Literary and Linguistic Computing), 2006, pp. 100–106.
Hu, M. and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of Proceedings of the ACM SIGKDD International Conference, 2004, pp. 168–177.
Jackson, L.A., K.S. Ervin, P.D. Gardner, and N. Schmitt, “Gender and the Internet: women communicating and men searching,” Sex Roles: A Journal of Research, vol. 44, no. 5–6, 2001, pp. 363–378.
Koppel, M., N. Akiva, and I. Dagan, “Feature instability as a criterion for selecting potential style markers,” J. Amer. Soc. Inf. Sci. Technol, vol. 57, no. 11, 2006, pp. 1519–1525.
Koppel, M., S. Argamon, and A. Shimoni, “Automatically categorizing written texts by author gender,” Literary and Linguistic Computing, vol. 14, no. 7, 2002, pp. 401–412.
Koppel, M. and J. Schler, “Exploiting stylistic idiosyncrasies for authorship attribution,” in Proceedings of Proceedings of the IJCAIWorkshop on Computational Approaches to Style Analysis and Synthesis, Acapulco, Mexico, 2003.
Ledger G.R. and T.V.N. Merriam, “Shakespeare, Fletcher, and the two noble kinsmen.,” Literary and Linguistic Computing, vol. 9, no. 4, 1994, pp. 235–248.
Li, J., Z. Zhang, X. Li, and H. Chen, “Kernel-based learning for biomedical relation extraction,” Journal of the American Society for Information Science and Technology (JASIST), vol. 59, no. 5, 2008, pp. 756–769.
Li, J., R. Zheng, and H. Chen, “From fingerprint to Writeprint,” Communications of the ACM, vol. 49, no. 4, 2006, pp. 76–82.
Martindale, C. and D. McKenzie, “On the utility of content analysis in author attribution: the federalist,” Comput. Humanit., vol. 29, no. 4, 1995, pp. 259–270.
Mendenhall, T.C. “The characteristic curves of composition,” Science, vol. 11, no. 11, 1887, pp. 237–249.
Mishne, G., “Experiments with mood classification,” in Proceedings of Proceedings of the 1st Workshop on Stylistic Analysis of Text for Information Access, Salvador, Brazil, 2005.
Mitra, A., “Voices of the marginalized on the Internet: examples from a Website for women of South Asia,” Journal of Communicationvol. 54, no. 3, 2004, pp. 492–510.
Mosteller, F., Applied Bayesian and Classical Inference: The Case of the Federalist Papers, 2nd ed., Springer, 1964.
National Election Study, “American National Election Study. 1998 Pre- and post- election survey,” Conducted by the Center for Political Studies of the Institute for Social Research, The University of Michigan, Ann Arbor, Inter-University Consortium for Political and Social Research, 1998.
Nowson, S. and J. Oberlander, “The identity of bloggers: openness and gender in personal Weblogs,” in Proceedings of Proceedings of the AAAI Spring Symposia on Computational Approaches to Analyzing Weblogs, Stanford, California, 2006.
O’Reilly, T. “What Is Web 2.0? Design patterns and business models for the next generation of software,” http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-Web-20.html, 2005.
Ogan, C., F. Cicek, and M. Ozakca, “Letters to Sarah: analysis of email responses to an online editorial,” New Media and Societyvol. 7, no. 4, 2005, pp. 533–557.
Pang, B., L. Lee, and S. Vaithyanathain, “Thumbs up? Sentiment classification using machine learning techniques,” in Proceedings of Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2002, pp. 79–86.
Peng, F., D. Schuurmans, V. Keselj, and S. Wang, “Automated authorship attribution with character level language models,” in Proceedings of Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, 2003.
Pew Internet and American Life Project, http://www.pewinternet.org/trends/User_Demo_7.22.08.htm, 2008.
Platt, J. Fast Training on SVMs Using Sequential Minimal Optimization, In Scholkopf, B., Burges, C., and Smola, A. (Ed.) ed., Advances in Kernel Methods: Support Vector Learning, Cambridge, MA: MIT Press, 1999.
Quinlan, J.R., “Induction of decision trees,” Machine Learning, vol. 1, no. 1, 1986, pp. 81–106.
Schler, J., M. Koppel, S. Argamon, and J. Pennebaker, “Effects of age and gender on blogging,” in Proceedings of Proceedings of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, Menlo Park, California, 2006, pp. 199–205.
Seale, C., S. Ziebland, and J. Charteris-Black, “Gender, cancer experience and Internet use: a comparative keyword analysis of interviews and online cancer support groups,” Social Science and Medicine, vol. 62, no. 10, 2006, pp. 2577–2590.
Shade, L.R., Gender and Community in the Social Construction of the InternetGender and Community in the Social Construction of the Internet, New York: Peter Lang Publishing, 2002.
Shannon, C.E., “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, no. 4, 1948, pp. 379–423.
Sherman, A.P., Cybergrrl @ Work: Tips and Inspiration for the Professional You, Berkley Trade, 2001.
Subasic, P. and A. Huettner, “Affect analysis of text using fuzzy semantic typing,” IEEE Transactions on Fuzzy Systems, vol. 9, no. 4, 2001, pp. 483–496.
Turney, P.D., “Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews,” in Proceedings of Proceedings of the 40th Annual Meetings of the Association for Computational Linguistics, Philadelphia, Pennsylvania, 2002, pp. 417–424.
Tweedie, F.J. and R.H. Baayen, “How variable may a constant be? Measures of lexical richness in perspective.,” Computers and the Humanities, vol. 32, no. 5, 1998, pp. 323–352.
Wiebe, J., T. Wilson, and M. Bell, “Identifying collocations for recognizing opinions,” in Proceedings of Proceedings of the ACL/EACL Workshop on Collocation, Toulouse, France, 2001.
Wiebe, J., T. Wilson, R. Bruce, M. Bell, and M. Martin, “Learning subjective language,” Computational Linguistics, vol. 30, no. 3, 2004, pp. 277–308.
Witten, I.H. and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques (2nd Edition), 2nd Edition ed., San Francisco: Morgan Kaufmann, 2005.
Yang, Y. and J.O. Pedersen, “A comparative study on feature selection in text categorization,” in Proceedings of Proceedings of the ICML97, 1997, pp. 412–420.
Youngs, G., “Cyberspace: the new feminist frontier,” in Karen Ross and Carolyn M. Byerly, ed., Women and Media: International PerspectivesWiley-Blackwell, 2004, pp. 185–208.
Yule, G.U., “On sentence length as a statistical characteristic of style in prose with application to two cases of disputed authorship,” Biometrika, vol. 30, 1938, pp. 363–390.
Yule, G.U., The Statistical Study of Literary Vocabulary, Cambridge University Press, 1944.
Zheng, R., J. Li, H. Chen, and Z. Huang, “A framework for authorship identification of online messages: writing-style features and classification techniques,” Journal of the American Society for Information Science and Technology (JASIST), vol. 57, no. 3, 2006, pp. 378–393.
Acknowledgments
This material is based upon work supported by the National Science Foundation under Grant No. CNS-0709338, “(CRI: CRD) Developing a Dark Web Collection and Infrastructure for Computational and Social Sciences.” We would also like to thank Dr. Katharina von Knop for her helpful suggestions and comments about our research test bed.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Chen, H. (2012). Women’s Forums on the Dark Web. In: Dark Web. Integrated Series in Information Systems, vol 30. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1557-2_19
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1557-2_19
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1556-5
Online ISBN: 978-1-4614-1557-2
eBook Packages: Computer ScienceComputer Science (R0)