Abstract
Bag-of-word (BOW) or fixed size window approach for word extraction in natural language text has ignored text structure and context information. Similarly, word co-occurrence based on linear word proximity has also ignored the linguistic criteria of words. This paper aims to propose a semantic window of word to address the needs to provide a context for capturing the structure and context of word in a sentence for analysis. The semantic window of word has linguistic elements which can be injected for collocate word identification. Selected data has been used as case studies. Quantitative analysis has been conducted as well. The proposed approach is evaluated and compared to sliding window which is the baseline. Semantic window is found to perform better than sliding window for linguistically enhanced collocate word extraction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hassan, S., Mihalcea, R., Banea, C.: Random-Walk Term Weighting for Improved Text Classification. In: Semantic Computing, ICSC 2007, pp. 242–249 (2007)
Wang, W., Do, D.B., Lin, X.: Term Graph Model for Text Classification. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 19–30. Springer, Heidelberg (2005)
Liu, J., Wang, J., Wang, C.: A Text Network Representation Model. In: Fifth International Conference on Fuzzy Systems and Knowledge Discovery, pp. 150–154 (2008)
Mihalcea, R., Tarau, P., Figa, E.: Pagerank on Semantic Networks, with Application to Word Sense Disambiguation. In: The 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 1126–1132 (2004)
Tomita, J., Nakawatase, H., Ishii, M.: Graph-based Text Database for Knowledge Discovery. In: 13th International World Wide Web Conference (WWW 2004), pp. 454–455 (2004)
Valle, K., Ozturk, P.: Graph-based Representations for Text Classification. India-Norway Workshop on Web Concepts and Technologies, Trondheim, Norway (2011)
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Proceedings of EMNLP, pp. 404–411 (2004)
Wan, X., Xiao, J.: Single Document Keyphrase Extraction using Neighborhood Knowledge. In: The Twenty-Third AAAI Conference on Artificial Intelligence, pp. 855–860 (2008)
Tsatsaronis, G., Varlamis, I., Nørvåg, K.: An Experimental Study on Unsupervised Graph-based Word Sense Disambiguation. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 184–198. Springer, Heidelberg (2010)
Seretan, V.: Syntax-based Collocation Extraction. Springer, New York (2011)
Seretan, V., Wehrli, E.: Multilingual Collocation Extraction with a Syntactic Parser. Language Resources and Evaluation 43(1), 71–85 (2007)
Yarowsky, D.: One Sense Per Collocation. In: Proceedings of ARPA Human Language Technology Workshop, pp. 266–271. Princeton (1993)
Padó, S., Lapata, M.: Dependency-based Construction of Semantic Space Models. Computational Linguistics, 161–199 (2007)
Williams, G.: In Search of Representativity in Specialised Corpora: Categorisation through Collocation. International Journal of Corpus Linguistics 7, 43–64 (2002)
Ferret, O.: Using Collocations for Topic Segmentation and Link Detection. In: Proceedings of the 19th International Conference on Computational linguistics (COLING 2002), pp. 260–266 (2002)
Seretan, V.: A Collocation-Driven Approach to Text Summarization. In: TALN 2011, pp. 9–14 (2011)
Wehrli, E., Seretan, V., Nerima, L., Russo, L.: Collocations in a Rule-Based MT System: A Case Study Evaluation of Their Translation Adequacy. In: The 13th Annual Conference of the EAMT, pp. 128–135 (2009)
Nerima, L., Wehrli, E., Seretan, V.: A Recursive Treatment of Collocations. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC 2010), pp. 634–638 (2010)
Arazy, O., Woo, C.: Enhancing Information Retrieval through Statistical Natural Language Processing: A Study of Collocation Indexing. MIS Quarterly 525–546 (2007)
Jin, P., Wu, Y., Yu, S.: SemEval-2007 Task 5: Multilingual Chinese-English Lexical Sample. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval 2007), pp. 19–23 (2007)
Li, W.: Chinese Collocation Extraction and its Application in Natural Language Processing. Doctor of Philosophy:The Hong Kong Polythenic University (2007)
Padó, S., Lapata, M.: Dependency-based Construction of Semantic Space Models. Computational Linguistics, 161–199 (2007)
Seretan, V.: Syntax-based Collocation Extraction. Springer, New York (2011)
Firth, J.R.: Papers in Linguistics 1934 - 1951. Oxford University Press, London (1957)
Wermter, J.: Collocation and Term Extraction Using Linguistically Enhanced Statistical Methods. Doctor of Philosophy: der Friedrich-Schiller-Universität Jena (2008)
Smadja, F.A.: Retrieving Collocations from Text: Xtract. Computational Linguistics 19(1), 143–177 (1993)
Halliday, M.A.: Lexis as Linguistic Level. In: Charles, E.B., John, C.C., Michael, A.K., Robbins, R.H. (eds.) Memory of F. R. Firth, pp. 148–162. Longman, Harlow (1966)
Sinclair, J.: Beginning the Study of Lexis. In: Charles, E., Bazell, J.C., Catford, H., Michael, A.K., Robbins, R.H. (eds.) Memory of F. R. Firth, Longman, London (1966)
Church, K., Hanks, P.: Word Association Norms, Mutual Information and Lexicography. Computational Linguistics 16, 22–29 (1990)
Church, K., Gale, W.A., Hanks, P., Hindle, D.: Using Statistics in Lexical Analysis. In: Uri, Z. (ed.) Lexical Acquisition. Using Online Resources to Build a Lexicon, pp. 115–164. Lawrence Erlbaum Associates, Hillsdale (1991)
Church, K.W.: One Term or Two? In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1995, pp. 310–318. ACM Press, Seattle (1995)
Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction Of Terminology. In: Klavans, J.L., Resnik, P. (eds.) The Balancing Act: Combining Statistical and Symbolic Approaches to Language, pp. 49–66. MIT Press, Cambridge (1996)
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. Bradford Book & MIT Press, Cambridge (1999)
Evert, S., Krenn, B.: Methods for the Qualitative Evaluation of Lexical Association Measures. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics and the 10th Conference of the European Chapter of the Association for Computational Linguistics, pp. 188–195. Morgan Kaufmann, San Francisco (2001)
Evert, B.: The Statistics of Word Co-occurrences: Word Pairs and Collocations. University of Stuttgart, Doctor of Philosophy (2005)
Zhao, H., Zhang, X., Kit, C.: Integrative Semantic Dependency Parsing via Efficient Large-scale Feature Selection. Journal of Artificial Intelligence Research 46, 203–233 (2013)
Culotta, A., Sorensen, J.: Dependency Tree Kernels for Relation Extraction. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, pp. 423–429 (2004)
Ding, Y., Palmer, M.: Synchronous Dependency Insertion Grammars: A Grammar Formalism for Syntax Based Statistical MT. In: Proceedings of the Workshop on Recent Advances in Dependency Grammar, Geneva, pp. 90–97 (2004)
Quirk, C., Menezes, A., Cherry, C.: Dependency Treelet Translation: Syntactically Informed Phrasal SMT. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, MI, pp. 271–279 (2005)
Johansson, R., Nugues, P.: Dependency-based Semantic Role Labeling of PropBank. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 69–78. Association for Computational Linguistics, Honolulu (2008)
Nivre, J.: Dependency Parsing. Language and Linguistics Compass 4/3, 138–152 (2010)
Fillmore, C.J.: Frame Semantics and the Nature of Language. In: Annals of the New York Academy of Sciences: Conference on the Origin and Development of Language and Speech, pp. 20–32 (1976)
Palmer, M., Gildea, D., Kingsbury, P.: The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 71–105 (2005)
Kipper, K., Dang, H.T., Palmer, M.: Class-based Construction of a Verb Lexicon. In: AAAI/IAAI, pp. 691–696 (2000)
Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., et al.: The NomBank Project: An Interim Report. In: A.M. (ed.) HLT-NAACL 2004 Workshop:Frontiers in Corpus Annotation, Boston, MA, pp. 24–31 (2004)
Buchholz, S., Marsi, E., Dubey, A., Krymolowski, Y.: CoNLL-X Shared Task on Multilingual Dependency Parsing. In: Proceeding of the 10th Conference on Computational Natural Language Learning, CoNLL-2006 (2006)
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., et al.: MaltParser: A Language-independent System for Data-driven Dependency Parsing. Natural Language Engineering 13(2), 95–135 (2007)
Surdeanu, M., Johansson, R., Meyers, A., Màrquez, L., Nivre, J.: The CoNLL-2008 Shared Task on Joint Parsing Of Syntactic And Semantic Dependencies. In: Proceedings of the 12th Conference on Computational Natural Language Learning (CoNLL-2008), Manchester, pp. 159–177 (2008)
Hajic, J., Ciaramita, M., Johansson, R., Kawahara, D., Marti, M.A., M‘arquez, L., et al.: The CoNLL 2009 Shared Task: Syntactic and Semantic Dependencies inMultiple Languages. In: Proceedings of the 13th CoNLL-2009, Boulder, Colorado, USA, pp. 1–18 (2009)
McDonald, R., Crammer, K., Pereira, F.: Online Large-margin Training of Dependency Parsers. In: Proceedings of ACL-2005 (2005)
Johansson, R., Nugues, P.: The Effect of Syntactic Representation on Semantic Role Labeling. In: Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008), pp. 393–400 (2008)
Siaw, N.H., Narayanan, K., Bali, R.-M., Jane, L.: Nested Event Model. SoMet: The 13th International Conference on Intelligent Software Methodologies Tools and Techniques, Langkawi, Malaysia, September 22-24 (in press, 2014)
Grimshaw, J.: Argument Structure. MIT Press, Cambridge (1990)
Valle, K.: Graph-based Representations for Textual Case-Based Reasoning. Master Thesis. Norwegian University of Science and Technology (2011)
Green, A.: Kappa Statistics for Multiple Raters using Categorical Classifications. In: Proceedings of the 22 Annual SAS User Group International Conference, pp. 16–19 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hiong, S.N., Ranaivo-Malançon, B., Kulathuramaiyer, N., Labadin, J. (2014). Linguistically Enhanced Collocate Words Model. In: Jaafar, A., et al. Information Retrieval Technology. AIRS 2014. Lecture Notes in Computer Science, vol 8870. Springer, Cham. https://doi.org/10.1007/978-3-319-12844-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-12844-3_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12843-6
Online ISBN: 978-3-319-12844-3
eBook Packages: Computer ScienceComputer Science (R0)