Dark Web pp 227-256 | Cite as

CyberGate Visualization

  • Hsinchun ChenEmail author
Part of the Integrated Series in Information Systems book series (ISIS, volume 30)


Computer-mediated communication (CMC) analysis systems are important for improving participant accountability and researcher analysis capabilities. However, existing CMC systems focus on structural features, with little support for analysis of text content in web discourse. In order to address this shortcoming, we propose a framework for CMC text analysis grounded in Systemic Functional Linguistic Theory. Our framework addresses several ambiguous CMC text mining issues, including the relevant tasks, features, information types, feature selection methods, and visualization techniques. Based on it, we have developed a system called CyberGate, which includes the Writeprint and Ink Blot techniques. These techniques incorporate complementary feature selection and visualization methods in order to allow a breadth of analysis and categorization capabilities. An application example is used to illustrate the ability of these techniques for CMC text analysis. Furthermore, experiments were conducted in comparison with a benchmark technique (Support Vector Machine) in order to assess the viability of CyberGate’s Writeprint and Ink Blot techniques for categorization of various forms of CMC text. The results indicated that the CyberGate techniques matched the Support Vector Machine performance in most cases while outperforming it for certain information types. Collectively, the results indicate that the system and its underlying design framework can dramatically improve text content analysis functions over those found in existing CMC systems.


Feature Selection Text Analysis Noun Phrase Text Mining Feature Selection Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research has been supported in part by the following grant: National Science Foundation (NSF), “COPLINK Center: Social Network Analysis and Identity Deception Detection for Law Enforcement and Homeland Security,” October 2004–September 2007.


  1. Abbasi, A., and Chen, H. “Identification and Comparison of Extremist-Group Web Forum Messages using Authorship Analysis,” IEEE Intelligent Systems (20:5), 2005, pp. 67–75.CrossRefGoogle Scholar
  2. Abbasi, A. and Chen, H. “Visualizing Authorship for Identification”, In the 4th IEEE Symposium on Intelligence and Security Informatics (ISI 2006), San Diego, CA, 2006.Google Scholar
  3. Allan, J. Carbonell, J, Doddington, G., Yamron, J. and Yang, Y “Topic detection and tracking pilot study: Final report,” in proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, 1998, pp. 194–218.Google Scholar
  4. Allan, J., Leuski, A., Swan R. C., and Byrd, D. “Evaluating Combinations of Ranked Lists and Visualizations of Inter-Document Similarity,” Information Processing and Management, (37:3), 2001, pp. 435–458.CrossRefzbMATHGoogle Scholar
  5. Argamon, S., Saric, M., and Stein, S, S. “Style Mining of Electronic Messages for Multiple Authorship Discrimination: First Results,” in Proceedings of the ninth ACM SIGKDD International conference on Knowledge discovery and data mining, 2003.Google Scholar
  6. Balakrishnan, P. V., V. S. Jacob. “Triangulation in decision support systems: Algorithms for product design,” Decision Support Systems, (14), 1995, pp. 313–327.CrossRefGoogle Scholar
  7. Campbell, D. T. and Fiske, D. W. “Convergent and Discriminant Validity by Multitrait-Multimethod Matrix,” Psychology Bulletin, (56:2), 1959, pp. 81–105.CrossRefGoogle Scholar
  8. Chen, H. Knowledge Management Systems. A Text Mining Perspective, Knowledge Computing Corporation, 2001.Google Scholar
  9. Chen, H., Lally, A.M., Zhu, B., and Chau, M. “HelpfulMed: Intelligent Searching for Medical Information over the Internet,” Journal of the American Society for Information Science and Technology (54:7), 2003, pp. 683–694.CrossRefGoogle Scholar
  10. Cothrel, J, P. “Measuring the Success of an Online Community,” Strategy and Leadership (20:2), 2000, pp. 17–21.CrossRefGoogle Scholar
  11. Cunningham, H. “GATE, a General Architecture for Text Engineering,” Computers and the Humanaties (36), 2002, pp. 223–254.CrossRefGoogle Scholar
  12. Daft, R, L., and Lengel, R, H. “Organizational Information Requirements, Media Richness and Structural Design,” Management Science (32:5), 1986, pp. 554–571.CrossRefGoogle Scholar
  13. Dash, M. and Liu, H. “Feature Selection for Classification,” Intelligent Data Analysis, (1), 1997, pp. 131–156.CrossRefGoogle Scholar
  14. Davenport, D. “Anonymity on the Internet: Why the Price May Be too High,” Communications of the ACM,(45:4), 2002, pp. 33–35.CrossRefGoogle Scholar
  15. Denzin, N. The Research Act, Aldine, Chicago, 1970.Google Scholar
  16. Donath, J. “Identity and Deception in the Virtual Community,” In Communities in Cyberspace, London, Routledge Press, 1999.Google Scholar
  17. Donath, J., Karahalio, K. and Viegas, F. “Visualizing Conversation,” in Proceedings of the 32nd Conference on Computer-Human Interaction (CHI’ 02), Chicago, USA, 1999.Google Scholar
  18. Donath, J. “A Semantic Approach to Visualizing Online Conversations,” Communications of the ACM, 45(4), 2002, pp. 45–49.CrossRefGoogle Scholar
  19. Duch, W., Wieczorek, T., Biesiada, J., and Blachnik M. “Comparison of feature ranking methods based on information entropy,” Neural Networks, 15, 2004.Google Scholar
  20. Dumais, S., Platt, J., Heckerman, D. And Sahami, M. “Inductive Learning Algorithms and Representations for Text Categorization,” In Proceedings of the Seventh of ACM-CIKM, 1998, pp. 148–155.Google Scholar
  21. Efron, M., Marchionini, G., and Zhiang, J. “Implications of the Recursive Representation Problem for Automatic Concept Identification in On-Line Government Information,” In Proceedings of the ASIST SIG-CR Workshop, 2004.Google Scholar
  22. Erickson, T. and Kellogg, W. A. “Social Translucence: An Approach to Designing Systems that Support Social Processes,” ACM Transactions on Computer-Human Interaction (7:1), 2000 pp. 59–83.CrossRefGoogle Scholar
  23. Fellbaum, C. Wordnet: An Electronic Lexical Database, The MIT Press, Cambridge, MA, 1998.CrossRefGoogle Scholar
  24. Fiore, A, T., and Smith, M, A. “Tree Map Visualizations of News Groups,” Poster Presented at IEEE Symposium on Information Visualization, 2002, Boston, Massachusetts.Google Scholar
  25. Forman, G. “An Extensive Empirical Study of Feature Selection Metrics for Text Classification,” The Journal of Machine Learning Research (3), 2003, pp. 1289–1305.Google Scholar
  26. Friedman, B., Kahn, P. H. and Howe, D. C. “Trust Online,” Communications of the ACM (43:12), 2000, pp. 88–93.CrossRefGoogle Scholar
  27. Guyon, I., and Elisseef, A. “An Introduction to Variable and Feature Selection,” The Journal of Machine Learning Research (3), 2003, pp. 1157–1182.Google Scholar
  28. Halliday, M.A.K. An Introduction to Functional Grammar, 2nd (ed). London: Edward Arnold, 1994, p. 179.Google Scholar
  29. Hearst, M. A. “Direction-Based Text Interpretation as an Information Access Refinement,” In P. Jacobs (Ed.), Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval, Mahwah, NJ: Lawrence Erlbaum Associates, 1992.Google Scholar
  30. Hearst, M. A. “Untangling Text Data Mining,” in Proceedings of the Association for Computational Linguistics, 1999, pp. 3–10.Google Scholar
  31. Hara, N., Bonk, C, J., and Angeli, C. “Content Analysis of Online Discussion In An Applied Educational Psychology Course,” Instructional Science (28), 2000, pp. 115–152.CrossRefGoogle Scholar
  32. Havre, S., Hetzler, E., Whitney, P. and Nowell, L. “ThemeRiver: Visualizing Thematic Changes in Large Document Collections,” IEEE Transactions on Visualization and Computer Graphics, (8:1), 2002, pp. 9–20.CrossRefGoogle Scholar
  33. Henri, F. “Computer Conferencing and Content Analysis,” in Collaborative Learning through Computer Conferencing: The Najaden papers, A.R. Kaye, (ed), 1992, pp. 115–136.CrossRefGoogle Scholar
  34. Herring, S. C. “Computer-Mediated Communication on the Internet,” Annual Review of Information Science and Technology (36:1), 2002, pp. 109–168.CrossRefGoogle Scholar
  35. Hevner, A, R., March, S, T., Park, J., and Ram, S. “Design Science in Information Systems Research,” MIS Quarterly (28:1), 2004, pp. 75–105.CrossRefGoogle Scholar
  36. Huang, S., Ward, M, O., and Rundensteiner, E, A. “Exploration of Dimensionality Reduction For Text Visualization,” in Proceedings of The Third International Conference on Coordinated and Multiple Views in Exploratory Visualization (CMV’05), 2005.Google Scholar
  37. Huber, P. J. “Projection Pursuit,” Annals of Statistics, (13:2), 1985, pp. 435–475.MathSciNetCrossRefGoogle Scholar
  38. Keim, D, A. “Information Visualization and Visual Data Mining,” IEEE Transactions on Visualization and Computer Graphics (7:1), 2002, pp. 100–107.MathSciNetGoogle Scholar
  39. Kelly, S. U., Sung, C., and Farnham, S. “Designing for Improved Social Responsibility, User Participation and Content in On-Line Communities,” in Proceedings of the Conference on Human Factors in Computing Systems (CHI 2002), 2002.Google Scholar
  40. Knight, K. “Mining Online Text,” Communications of the ACM (42:11), 1999, pp. 58–61.CrossRefGoogle Scholar
  41. Lee, A, S. “Electronic Mail as a Medium of Rich Communication: An Empirical Investigation using Hermeneutic Interpretation,” MIS Quarterly, 1994, pp. 143–157.CrossRefGoogle Scholar
  42. Lewis, D. “Text Representation for Intelligent Text Retrieval: A Classification Oriented View,” In P. Jacobs (Ed.), Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval, Mahwah, NJ: Lawrence Erlbaum Associates, 1992.Google Scholar
  43. Li, J., Zheng, R. and Chen, H. “From Fingerprint to Writeprint,” Communications of the ACM, (49:4), 2006, pp. 76–82.CrossRefGoogle Scholar
  44. Losiewicz, P., Oard, D. and Kostoff, R. N. “Textual Data Mining to Support Science and Technology Management,” Journal of Intelligent Information Systems, (15), 2000, pp. 99–119.CrossRefGoogle Scholar
  45. March, S. T. and Smith, G. “Design and Natural Science Research on Information Technology,” Decision Support Systems (15:4), 1995, pp. 251–266.CrossRefGoogle Scholar
  46. Markus, M, L., Majchrzak, A., and Gasser, L. “A Design Theory for Systems That Support Emergent Knowledge Processes,” MIS Quarterly (26:3), 2002, pp. 179–212.Google Scholar
  47. McDonald, D., Chen, H., Hua S., and Marshall, B. “Extracting Gene Pathway Relations using a Hybrid Grammar: The Arizona Relation Parser,” Bioinformatics (20:18), 2004, pp. 3370–3378.CrossRefGoogle Scholar
  48. Miller, N. E., Wong, P. C., Brewster, M. and Foote, H. “Topic Islands: A wavelet-based text visualization system,” in Proceedings of IEEE Visualization ‘98, Research Triangle Park, NC, USA. 1998Google Scholar
  49. Mladenic, D. “Text-Learning and Related Intelligent Agents: A Survey,” IEEE Intelligent Systems (14:4), 1999, pp. 44–54.CrossRefGoogle Scholar
  50. Nasukawa, T. and Nagano, T. “Text Analysis and Knowledge Mining System,” IBM Systems Journal (40:4), 2001, pp. 967–984.CrossRefGoogle Scholar
  51. Nissenbaum, H. “Accountability in a Computerized Society,” Science and Engineering Ethics (2), 1996, pp. 25–42.CrossRefGoogle Scholar
  52. Paccagnella, L. “Getting the Seats of Your Pants Dirty: Strategies for Ethnographic Research on Virtual Communities,” Journal of Computer Mediated Communication (3:1), 1997.CrossRefGoogle Scholar
  53. Pang, B., Lee, L., and Vaithyanathain, S. “Thumbs up? Sentiment classification using machine learning techniques”, in proceedings of the Empirical Methods in Natural Language Processing (EMNLP 2002), 2002.Google Scholar
  54. Panteli, N. “Richness, Power Cues and Email Text,” Information and Management, 2002, pp. 75–86.CrossRefGoogle Scholar
  55. Picard, R. W. Affective Computing, MIT Press, Cambridge, MA., 1997.Google Scholar
  56. Rudman, J. “The state of authorship attribution studies: some problems and solutions,” Computers and the Humanities (31), 1998, pp. 351–365.CrossRefGoogle Scholar
  57. Rohrer, R, M., Elbert, D, S., and Sibert, J, S. “The Shape of Shakespeare: Visualizing Text using Implicit Surfaces,” in Proceedings of the 1998 IEEE Symposium on Information Visualization North Carolina, 1998, pp. 121–129.Google Scholar
  58. Rourke, L., Anderson, T., Garrison, D. R., and Archer, W. “Methodological Issues in the Content Analysis of Computer Conference Transcripts,” International Journal of Artificial Intelligence in Education, (12), 2001.Google Scholar
  59. Sack, W. “Conversation Map: An Interface for Very Large-Scale Conversations,” Journal of Management Information Systems (17:3), 2000, pp. 73–92.CrossRefGoogle Scholar
  60. Santini, M. “A Shallow Approach to Syntactic Feature Extraction for Genre Classification,” in Proceedings of the 7thAnnual Colloquium for the UK Special Interest Group for Computational Linguistics (CLUK 04), 2004.Google Scholar
  61. Seo, J., and Shneiderman, B. “Interactively Exploring Hierarchical Clustering Results,” IEEE Computer (35:7), 2002, pp. 80–86.CrossRefGoogle Scholar
  62. Simon, H, A. The Sciences of the Artificial, 3rd (ED), MIT Press, Cambridge, MA, 1996.Google Scholar
  63. Smith, M. A. “Invisible crowds in cyberspace: Mapping the Social Structure of Usenet,” in M. Smith and P. Kollock (Eds.), Communities in Cyberspace, London, Routledge, 1999.CrossRefGoogle Scholar
  64. Smith, M, A., and Fiore, A, T. “Visualization Components for Persistent Conversations,” Proceedings of the SIGCHI conference on Human factors in computing systems, Seattle, Washington, United States, 2001, pp. 136–143.Google Scholar
  65. Smith, M. “Tools for Navigating Large social Cyberspaces,” Communications of ACM (45:4), 2002, pp. 51–55.CrossRefGoogle Scholar
  66. Spears, R., and Lea, M. “Social Influence and the Influence of the Social in the Computer-Mediated Communication,” in M. Lea (ED), Contexts of Computer-Mediated Communication, Hemel-Hempstead: Harvester Wheat sheaf, 1992, pp. 30–65.Google Scholar
  67. Spears, R., and Lea, M. “Panacea or Panopticon? The Hidden Power in Computer-Mediated Communication.” Communication Research, (4), 1994, pp. 427–459.CrossRefGoogle Scholar
  68. Subasic, P., and Huettner, A. “Affect Analysis of Text Using Fuzzy Semantic Typing,” IEEE Transactions on Fuzzy Systems (9:4), 2001, pp. 483–496.CrossRefGoogle Scholar
  69. Tan, A. “Text Mining: The State of the Art and the Challenges,” In Proceedings of the PAKDD Workshop on Knowledge Discovery and Data Mining, 1999.Google Scholar
  70. Turney, P, D., and Littman, M, L. “Measuring Praise and Criticism: Inference of Semantic Orientation from Association,” ACM Transactions on Information Systems (21:4), 2003, pp. 315–346.CrossRefGoogle Scholar
  71. Viegas, F.B., and Smith, M. “Newsgroup Crowds and AuthorLines: Visualizing the Activity of Individuals in Conversational Cyberspaces,” in Proceedings of the 37th Hawaii International Conference on System Sciences (HICSS, 04), Hawaii, USA, 2004.Google Scholar
  72. Walls, J, G., Widmeyer, G, R., and El Sawy, O, A. “Building an Information System Design Theory for Vigilant EIS,” Information Systems Research (3:1), 1992, pp. 36–59.CrossRefGoogle Scholar
  73. Wasko, M, M., and Faraj, S. “Why Should I Share? Examining Social Capital and Knowledge Contribution in Electronic Networks of Practice,” MIS Quarterly (29:1), 2005, pp. 35–57.CrossRefGoogle Scholar
  74. Welck, K. 1987 “Theorizing About Organizational Communication,” in F. M. Jablin, L. L. Putnam, K. H. Roberts, and L. W. Porter (Eds.), Handbook of Organizational Communication: An Interdisciplinary Perspective, Newbury Park, CA, Sage, pp. 97–129.Google Scholar
  75. Wellman, B. “Computers Networks as Social Networks,” Science (293), 2001, pp. 2031–2034.CrossRefGoogle Scholar
  76. Wenger, E, C., and Snyder, W, M. “Communities of Practice: The Organizational Frontier,” Harvard Business Review, 2000.Google Scholar
  77. Whitelaw, C., and Patrick, J. “Selecting Systemic Features for Text Classification,” in Proceedings of AAAI Fall Symposium on Style and Meaning in Language, Art, and Music, 2004.Google Scholar
  78. Wise, J.A. “The Ecological Approach to Text Visualization,” Journal of the American Society for Information Science (50:13), 1999, pp. 1224–1233.CrossRefGoogle Scholar
  79. Xiong, R., Donath, J., “PeopleGarden: Creating Data Portraits for Users,” in Proceedings of UIST 1999.Google Scholar
  80. Yates, J., and Orlikowski, W. J. “Genres of Organizational Communication: A Structurational approach to Studying Communication and Media,” Academy of Management Review (17:2), 1992, pp. 299–326.CrossRefGoogle Scholar
  81. Yates, J., and Orlikowski, W. J. “Genre Systems: Structuring Interaction through Communicative Norms,” The Journal of Business Communication, (39:1), 2002, pp. 13–35.CrossRefGoogle Scholar
  82. Yates, J., Orlikowski, W., and Okamura, K. “Explicit and Implicit Structuring of Genres in Electronic Communication: Reinforcement and Change of Social Interaction,” Organizational Science, (10:1), 1999, pp. 83–103.CrossRefGoogle Scholar
  83. Zheng, R., Li, J., Chen, H., and Huang, Z. “A Framework of Authorship Identification of Online Messages: Writing-Style Features and Classification Techniques,” Journal of the American Society for Information Science and Technology (JASIST), 57(3), 2006, pp. 378–393.CrossRefGoogle Scholar
  84. Zhou, L., Burgoon, J. K., Nunamaker, J. F., and Twichell, D. “Automating Lingusitics-Based Cues for Deception Detection in Text-Based Asynchronous Computer-Mediated Communication,” Group Decision and Negotiation, (13:1), 2004, pp. 81–106.CrossRefGoogle Scholar
  85. Zhu B. and Chen H. “Social Visualization for Computer-Mediated Communications: A Knowledge Management Perspective,” in Proceedings of the Eleventh Workshop on Information Technologies and Systems 2001, Baton Rouge, LA, USA.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Management Information SystemsUniversity of ArizonaTusconUSA

Personalised recommendations