Dark Web pp 369-389 | Cite as

Women’s Forums on the Dark Web

  • Hsinchun ChenEmail author
Part of the Integrated Series in Information Systems book series (ISIS, volume 30)


With the recent advent of Web 2.0, more and more women participate in and exchange opinions through community-based social media on the Internet. Questions concerning gender differences in the context of online communication have been raised. In this study, we develop a feature-based text classification framework to examine the online gender differences between female and male posters on web forums by analyzing writing styles and topics of interests. We examine the performance of different feature sets in an experiment involving political opinions. The results of our experimental study on this Islamic women’s political forum show that the feature sets containing both content-free and content-specific features perform significantly better than those consisting of only content-free features. In addition, feature subset selection can improve the classification results significantly. Female and male participants were found to have significantly different topics of interest in our study.


Function Word Gender Classification Online Review Syntactic Feature Sentiment Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This material is based upon work supported by the National Science Foundation under Grant No. CNS-0709338, “(CRI: CRD) Developing a Dark Web Collection and Infrastructure for Computational and Social Sciences.” We would also like to thank Dr. Katharina von Knop for her helpful suggestions and comments about our research test bed.


  1. Abbasi, A. and H. Chen, “Applying authorship analysis to extremist-group Web forum messages,” IEEE Intelligent Systems, vol. 20, no. 5 (Special issue on artificial intelligence for national and homeland security), 2005, pp. 67–75.CrossRefGoogle Scholar
  2. Abbasi, A. and H. Chen, “Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace,” ACM Transactions on Information Systems, vol. 26, no. 2, 2008, pp. 1–29.CrossRefGoogle Scholar
  3. Abbasi, H. Chen, and J.F. Nunamaker, “Stylometric identification in electronic markets: scalability and robustness,” Journal of Management Information Systems, vol. 25, no. 1, 2008b, pp. 49–78.CrossRefGoogle Scholar
  4. Abbasi, A., H. Chen, and A. Salem, “Sentiment analysis in multiple languages: feature selection for opinion classification in Web forums,” ACM Transactions on Information Systems, vol. 26, no. 3, 2008a, pp. 1–34.CrossRefGoogle Scholar
  5. Argamon, S., M. Koppel, and G. Avneri, “Routing documents according to style.,” in Proceedings of Proceedings of the 1st International Workshop on Innovative Information, Pisa, Italy, 1988.Google Scholar
  6. Argamon, S., M. Koppel, J. Fine, and A. Shimoni, “Gender, genre, and writing style in formal written texts,” Text, vol. 23, no. 3, 2003a, pp. 321–346.CrossRefGoogle Scholar
  7. Argamon, S., M. Saric, and S.S. Stein, “Style mining of electronic messages for multiple authorship discrimination,” in Proceedings of Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003b, pp. 475–480.Google Scholar
  8. Baayen, R.H., H.V. Halteren, A. Neijt, and F.J. Tweedie, “An experiment in authorship attribution,” in Proceedings of Proceedings of the 6th International Conference on Statistical Analysis of Textual Data, 2002, pp. 69–75.Google Scholar
  9. Baayen, R.H., H.V. Halteren, and F.J. Tweedie, “Outside the cave of shadows: using syntactic annotation to enhance authorship attribution,” Literary and Linguistic Computing, vol. 11, no. 3, 1996, pp. 121–132.CrossRefGoogle Scholar
  10. Bimber, B., “Measuring the gender gap on the Internet,” Social Science Quarterly, vol. 81, no. 3, 2000, pp. 868–876.Google Scholar
  11. Burrows, J.F., “‘An ocean where each kind….’ Statistical analysis and some major determinants of literary style,” Computers and the Humanities, vol. 23, no. 4–5, 1989, pp. 309–321.CrossRefGoogle Scholar
  12. CommerceNet, “The CommerceNet/Nielsen Internet demographic survey (1999),”, 1999.
  13. Consaluo, M. and S. Paasonen, Women and Everyday Uses of the Internet: Agency and Identity, New York: Peter Lang Publishing, 2002.Google Scholar
  14. Corney, M., O. de Vel, A. Anderson, and G. Mohay, “Gender-preferential text mining of e-mail discourse,” in Proceedings of Proceedings of the 18th Annual Computer Security Applications Conference (ACSAC 2002), Las Vegas, 2002, pp. 282–292.Google Scholar
  15. Dave, K., S. Lawrence, and D. Pennock, “Mining the peanut gallery: opinion extraction and semantic classification of product reviews,” in Proceedings of Proceedings of the 12th International World Wide Web Conference (WWW’03), 2003, pp. 519–528.Google Scholar
  16. de Vel, O., “Mining E-mail Authorship,” in Proceedings of Paper presented at the Workshop on Text Mining, ACM International Conference on Knowledge Discovery and Data Mining (KDD 2000), Boston, MA, 2000.Google Scholar
  17. de Vel, O., A. Anderson, M. Corney, and G. Mohay, “Mining e-mail content for author identification forensics,” SIGMOD Record, vol. 30, no. 4, 2001, pp. 55–64.CrossRefGoogle Scholar
  18. Diederich, J., J. Kindermann, E. Leopold, and G. Paass, “Authorship attribution with support ­vector machines,” Applied Intelligence, vol. 19, no. 1–2, 2003, pp. 109–123.CrossRefzbMATHGoogle Scholar
  19. Forsyth, R.S., and D.I. Holmes, “Feature finding for text classification,” Literary and Linguistic Computing, vol. 11, no. 4, 1996, pp. 163–174.CrossRefGoogle Scholar
  20. Fountain, J.E., “Constructing the information society: women, information technology, and design,” Technology and Societyvol. 22, no. 1, 2000, pp. 45–62.CrossRefGoogle Scholar
  21. Fuller, J.E., “Equality in cyberdemocracy? Gauging gender gaps in on-line civic participation,” Social Science Quarterlyvol. 85, no. 4, 2004, pp. 938–957.CrossRefGoogle Scholar
  22. Gamon, M., “Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis,” in Proceedings of Proceedings of the 20th International Conference on Computational Linguistics, 2004, pp. 841–847.Google Scholar
  23. Grefenstette, G., Y. Qu, J.G. Shanahan, and D.A. Evans, “Coupling niche browsers and affect analysis for an opinion mining application,” in Proceedings of Proceedings of the 12th International Conference Recherche d’Information Assistee par Ordinateur, 2004, pp. 186–194.Google Scholar
  24. Guiller, J. and A. Durndell, “Students’ linguistic behaviour in online discussion groups: Does gender matter?” Computers in Human Behavior, vol. 23, no. 5, 2007, pp. 2240–55.CrossRefGoogle Scholar
  25. Guo, B. and M.S. Nixon, “Gait feature subset selection by mutual information,” IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans, vol. 39, no. 1, 2009, pp. 36–46.CrossRefGoogle Scholar
  26. Halbert, D. “Shulamith firestone: radical feminism and visions of the information society,” Information Communication and Society, vol. 7, no. 1, 2004, pp. 115–136.CrossRefGoogle Scholar
  27. Harcourt, W., “The personal and the political: women using the Internet,” Cyberpsychology and Behaviorvol. 3, no. 5, 2000, pp. 693–697.CrossRefGoogle Scholar
  28. Harp, D. and M. Tremayne, “The gendered blogosphere: examining inequality using network and feminist theory,” Journalism and Mass Communication Quarterlyvol. 83, no. 2, 2006, pp. 247–264.CrossRefGoogle Scholar
  29. Holmes, D.I. and R.S. Forsyth, “The federalist revisited: new directions in authorship attribution,” Literary and Linguistic Computing, vol. 10, no. 2, 1995, pp. 111–127.CrossRefGoogle Scholar
  30. Hota, S., S. Argamon, M. Koppel, and I. Zigdon, “Performing gender: automatic stylistic analysis of Shakespeare’s characters,” in Proceedings of Proceedings of the Digital Humanities Conference (Association for Computers in Humanities and the Association for Literary and Linguistic Computing), 2006, pp. 100–106.Google Scholar
  31. Hu, M. and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of Proceedings of the ACM SIGKDD International Conference, 2004, pp. 168–177.Google Scholar
  32. Jackson, L.A., K.S. Ervin, P.D. Gardner, and N. Schmitt, “Gender and the Internet: women ­communicating and men searching,” Sex Roles: A Journal of Research, vol. 44, no. 5–6, 2001, pp. 363–378.CrossRefGoogle Scholar
  33. Koppel, M., N. Akiva, and I. Dagan, “Feature instability as a criterion for selecting potential style markers,” J. Amer. Soc. Inf. Sci. Technol, vol. 57, no. 11, 2006, pp. 1519–1525.CrossRefGoogle Scholar
  34. Koppel, M., S. Argamon, and A. Shimoni, “Automatically categorizing written texts by author gender,” Literary and Linguistic Computing, vol. 14, no. 7, 2002, pp. 401–412.CrossRefGoogle Scholar
  35. Koppel, M. and J. Schler, “Exploiting stylistic idiosyncrasies for authorship attribution,” in Proceedings of Proceedings of the IJCAIWorkshop on Computational Approaches to Style Analysis and Synthesis, Acapulco, Mexico, 2003.Google Scholar
  36. Ledger G.R. and T.V.N. Merriam, “Shakespeare, Fletcher, and the two noble kinsmen.,” Literary and Linguistic Computing, vol. 9, no. 4, 1994, pp. 235–248.CrossRefGoogle Scholar
  37. Li, J., Z. Zhang, X. Li, and H. Chen, “Kernel-based learning for biomedical relation extraction,” Journal of the American Society for Information Science and Technology (JASIST), vol. 59, no. 5, 2008, pp. 756–769.CrossRefGoogle Scholar
  38. Li, J., R. Zheng, and H. Chen, “From fingerprint to Writeprint,” Communications of the ACM, vol. 49, no. 4, 2006, pp. 76–82.CrossRefGoogle Scholar
  39. Martindale, C. and D. McKenzie, “On the utility of content analysis in author attribution: the ­federalist,” Comput. Humanit., vol. 29, no. 4, 1995, pp. 259–270.CrossRefGoogle Scholar
  40. Mendenhall, T.C. “The characteristic curves of composition,” Science, vol. 11, no. 11, 1887, pp. 237–249.CrossRefGoogle Scholar
  41. Mishne, G., “Experiments with mood classification,” in Proceedings of Proceedings of the 1st Workshop on Stylistic Analysis of Text for Information Access, Salvador, Brazil, 2005.Google Scholar
  42. Mitra, A., “Voices of the marginalized on the Internet: examples from a Website for women of South Asia,” Journal of Communicationvol. 54, no. 3, 2004, pp. 492–510.MathSciNetCrossRefGoogle Scholar
  43. Mosteller, F., Applied Bayesian and Classical Inference: The Case of the Federalist Papers, 2nd ed., Springer, 1964.Google Scholar
  44. National Election Study, “American National Election Study. 1998 Pre- and post- election survey,” Conducted by the Center for Political Studies of the Institute for Social Research, The University of Michigan, Ann Arbor, Inter-University Consortium for Political and Social Research, 1998.Google Scholar
  45. Nowson, S. and J. Oberlander, “The identity of bloggers: openness and gender in personal Weblogs,” in Proceedings of Proceedings of the AAAI Spring Symposia on Computational Approaches to Analyzing Weblogs, Stanford, California, 2006.Google Scholar
  46. O’Reilly, T. “What Is Web 2.0? Design patterns and business models for the next generation of software,”, 2005.
  47. Ogan, C., F. Cicek, and M. Ozakca, “Letters to Sarah: analysis of email responses to an online editorial,” New Media and Societyvol. 7, no. 4, 2005, pp. 533–557.CrossRefGoogle Scholar
  48. Pang, B., L. Lee, and S. Vaithyanathain, “Thumbs up? Sentiment classification using machine learning techniques,” in Proceedings of Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2002, pp. 79–86.Google Scholar
  49. Peng, F., D. Schuurmans, V. Keselj, and S. Wang, “Automated authorship attribution with character level language models,” in Proceedings of Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, 2003.Google Scholar
  50. Pew Internet and American Life Project,, 2008.
  51. Platt, J. Fast Training on SVMs Using Sequential Minimal Optimization, In Scholkopf, B., Burges, C., and Smola, A. (Ed.) ed., Advances in Kernel Methods: Support Vector Learning, Cambridge, MA: MIT Press, 1999.Google Scholar
  52. Quinlan, J.R., “Induction of decision trees,” Machine Learning, vol. 1, no. 1, 1986, pp. 81–106.Google Scholar
  53. Schler, J., M. Koppel, S. Argamon, and J. Pennebaker, “Effects of age and gender on blogging,” in Proceedings of Proceedings of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, Menlo Park, California, 2006, pp. 199–205.Google Scholar
  54. Seale, C., S. Ziebland, and J. Charteris-Black, “Gender, cancer experience and Internet use: a comparative keyword analysis of interviews and online cancer support groups,” Social Science and Medicine, vol. 62, no. 10, 2006, pp. 2577–2590.CrossRefGoogle Scholar
  55. Shade, L.R., Gender and Community in the Social Construction of the InternetGender and Community in the Social Construction of the Internet, New York: Peter Lang Publishing, 2002.Google Scholar
  56. Shannon, C.E., “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, no. 4, 1948, pp. 379–423.MathSciNetCrossRefGoogle Scholar
  57. Sherman, A.P., Cybergrrl @ Work: Tips and Inspiration for the Professional You, Berkley Trade, 2001.Google Scholar
  58. Subasic, P. and A. Huettner, “Affect analysis of text using fuzzy semantic typing,” IEEE Transactions on Fuzzy Systems, vol. 9, no. 4, 2001, pp. 483–496.CrossRefGoogle Scholar
  59. Turney, P.D., “Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews,” in Proceedings of Proceedings of the 40th Annual Meetings of the Association for Computational Linguistics, Philadelphia, Pennsylvania, 2002, pp. 417–424.Google Scholar
  60. Tweedie, F.J. and R.H. Baayen, “How variable may a constant be? Measures of lexical richness in perspective.,” Computers and the Humanities, vol. 32, no. 5, 1998, pp. 323–352.CrossRefGoogle Scholar
  61. Wiebe, J., T. Wilson, and M. Bell, “Identifying collocations for recognizing opinions,” in Proceedings of Proceedings of the ACL/EACL Workshop on Collocation, Toulouse, France, 2001.Google Scholar
  62. Wiebe, J., T. Wilson, R. Bruce, M. Bell, and M. Martin, “Learning subjective language,” Computational Linguistics, vol. 30, no. 3, 2004, pp. 277–308.CrossRefGoogle Scholar
  63. Witten, I.H. and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques (2nd Edition), 2nd Edition ed., San Francisco: Morgan Kaufmann, 2005.zbMATHGoogle Scholar
  64. Yang, Y. and J.O. Pedersen, “A comparative study on feature selection in text categorization,” in Proceedings of Proceedings of the ICML97, 1997, pp. 412–420.Google Scholar
  65. Youngs, G., “Cyberspace: the new feminist frontier,” in Karen Ross and Carolyn M. Byerly, ed., Women and Media: International PerspectivesWiley-Blackwell, 2004, pp. 185–208.CrossRefGoogle Scholar
  66. Yule, G.U., “On sentence length as a statistical characteristic of style in prose with application to two cases of disputed authorship,” Biometrika, vol. 30, 1938, pp. 363–390.Google Scholar
  67. Yule, G.U., The Statistical Study of Literary Vocabulary, Cambridge University Press, 1944.Google Scholar
  68. Zheng, R., J. Li, H. Chen, and Z. Huang, “A framework for authorship identification of online messages: writing-style features and classification techniques,” Journal of the American Society for Information Science and Technology (JASIST), vol. 57, no. 3, 2006, pp. 378–393.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Management Information SystemsUniversity of ArizonaTusconUSA

Personalised recommendations