Abstract
We present a symbolic and graph-based approach for mapping knowledge domains. The symbolic component relies on shallow linguistic processing of texts to extract multi-word terms and cluster them based on lexico-syntactic relations. The clusters are subjected to graph decomposition based on inherent graph theoretic properties of association graphs of items (multi-word terms and authors). This includes the search for complete minimal separators that can decompose the graphs into central (core topics) and peripheral atoms. The methodology is implemented in the TermWatch system and can be used for several text mining tasks. In this paper, we apply our methodology to map the dynamics of terrorism research between 1990-2006. We also mined for frequent itemsets as a mean of revealing dependencies between formal concepts in the corpus. A comparison of the extracted frequent itemsets and the structure of the central atom shows an interesting overlap. The main features of our approach lie in the combination of state-of-the-art techniques from Natural Language Processing (NLP), Clustering and Graph Theory to develop a system and a methodology adapted to uncovering hidden sub-structures from texts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Porter, M.F.: An algorithm for suffix stripping. Program: Electronic Library and Information Systems 40(3), 211–218 (2006)
Zitt, M., Bassecoulard, E.: Development of a method for detection and trend analysis of research fronts built by lexical or co-citation analysis. Scientometrics 30(1), 333–351 (1994)
Glenisson, P., Glänzel, W., Janssens, F., De Moor, B.: Combining full text and bibliometric information in mapping scientific disciplines. Information Processing and Management 41(6), 1548–1572 (2005)
Harris, Z.S.: Mathematical Structures of Language. Wiley, New York (1968)
Grefenstette, G.: Sqlet: Short query linguistic expansion techniques, palliating one-word queries by providing intermediate structure to text. In: Proceedings of Recherche d’Information Assiste par Ordinateur (RIAO), pp. 500–509 (1997)
Watcholder, N., Evans, D., Klavans, J.: Automatic identification of index terms for interactive browsing. In: Proceedings of the ACM IEEE Joint Conference on Digital Libraries, Roanoke, Virginia, pp. 116–124 (2001)
Ibekwe-SanJuan, F.: Terminological variation, a means of identifying research topics from texts. In: Proc. of Joint ACL-COLING 1998, Québec, Canada, August 10-14, pp. 564–570 (1998)
Ibekwe-SanJuan, F., SanJuan, E.: From term variants to research topics. Journal of Knowledge Organization (ISKO), Special Issue on Human Language Technology 29(3/4) (2003)
Ibekwe-SanJuan, F., SanJuan, E.: Mining textual data through term variant clustering: the termwatch system. In: Proc. of Recherche d’Information assistée par ordinateur (RIAO), Avignon, France, pp. 26–28 (April 2004)
SanJuan, E., Ibekwe-SanJuan, F.: Text mining without document context. Information Processing and Management 42, 1532–1552 (2006)
Ibekwe-SanJuan, F., Dubois, C.: Can syntactic variations highlight semantic links between domain topics? In: Proc. of the 6th International Conference on Terminology (TKE), Nancy, France, pp. 57–63 (August 2002)
Sanjuan, E., Dowdall, J., Ibekwe-Sanjuan, F., Rinaldi, F.: A symbolic approach to automatic multiword term structering. Computer Speech Language (CSL) 19(4), 524–542 (2005)
Chen, C., Ibekwe-SanJuan, F., SanJuan, E., Weaver, C.: Visual analysis of conflicting opinions. In: 1st International IEEE Symposium on Visual Analytics Science and Technology (VAST 2006), Baltimore - Maryland, USA, pp. 59–66 (2006)
Didi Biha, M., Kaba, B., Meurs, M.-J., SanJuan, E.: Graph Decomposition Approaches for Terminology Graphs. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 883–893. Springer, Heidelberg (2007)
Ibekwe-Sanjuan, F., SanJuan, E., Vogeley, M.S.E.: Decomposition of terminology graphs for domain knowledge acquisition. In: Shanahan, J.G., Amer-Yahia, S., Manolescu, I., Zhang, Y., Evans, D.A., Kolcz, A., Choi, K.S., Chowdhury, A. (eds.) CIKM, pp. 1463–1464. ACM (2008)
Chen, C.: Citespace ii: Detecting and visualizing emerging trends and transient patterns in scientific literature. JASIS 57(3), 359–377 (2006)
Chen, H., Wingyan, C., Qin, J., Reid, E., Sageman, M.: Uncovering the dark web: A case study of jihad on the web. Journal of the American Society for Information Science, JASIS 59(8), 1347–1359 (2008)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK, vol. 12 (1994)
Miller, G.A.: Wordnet: A Lexical Database for English. In: HLT. Morgan Kaufmann (1994)
Ibekwe-SanJuan, F.: A linguistic and mathematical method for mapping thematic trends from texts. In: Proc. of the 13th European Conference on Artificial Intelligence (ECAI), Brighton, UK, pp. 170–174 (August 1998)
Ferrer i Cancho, R., Solé, R.V.: The small world of human language. Proceedings of The Royal Society of London. Series B, Biological Sciences 268, 2261–2266 (2001)
Agrawal, R., Imielińskivand, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD Rec. 22, 207–216 (1993)
Ganter, B., Stumme, G., Wille, R. (eds.): Formal Concept Analysis. LNCS (LNAI), vol. 3626. Springer, Heidelberg (2005)
Zaki, M.J.: Closed itemset mining and non-redundant association rule mining. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 365–368. Springer US (2009)
Berry, A., Pogorelcnik, R., Simonet, G.: An introduction to clique minimal separator decomposition. Algorithms 3(2), 197–215 (2010)
Fruchterman, T.M.J., Reingold, E.M.: Graph drawing by force-directed placement. Software: Practice and Experience 21(11), 1129–1164 (1991)
Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
SanJuan, E. (2013). TermWatch II: Unsupervised Terminology Graph Extraction and Decomposition. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2011. Communications in Computer and Information Science, vol 348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37186-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-37186-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37185-1
Online ISBN: 978-3-642-37186-8
eBook Packages: Computer ScienceComputer Science (R0)