Skip to main content

Linguistically Enhanced Collocate Words Model

  • Conference paper
Book cover Information Retrieval Technology (AIRS 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8870))

Included in the following conference series:

Abstract

Bag-of-word (BOW) or fixed size window approach for word extraction in natural language text has ignored text structure and context information. Similarly, word co-occurrence based on linear word proximity has also ignored the linguistic criteria of words. This paper aims to propose a semantic window of word to address the needs to provide a context for capturing the structure and context of word in a sentence for analysis. The semantic window of word has linguistic elements which can be injected for collocate word identification. Selected data has been used as case studies. Quantitative analysis has been conducted as well. The proposed approach is evaluated and compared to sliding window which is the baseline. Semantic window is found to perform better than sliding window for linguistically enhanced collocate word extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hassan, S., Mihalcea, R., Banea, C.: Random-Walk Term Weighting for Improved Text Classification. In: Semantic Computing, ICSC 2007, pp. 242–249 (2007)

    Google Scholar 

  2. Wang, W., Do, D.B., Lin, X.: Term Graph Model for Text Classification. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 19–30. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Liu, J., Wang, J., Wang, C.: A Text Network Representation Model. In: Fifth International Conference on Fuzzy Systems and Knowledge Discovery, pp. 150–154 (2008)

    Google Scholar 

  4. Mihalcea, R., Tarau, P., Figa, E.: Pagerank on Semantic Networks, with Application to Word Sense Disambiguation. In: The 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 1126–1132 (2004)

    Google Scholar 

  5. Tomita, J., Nakawatase, H., Ishii, M.: Graph-based Text Database for Knowledge Discovery. In: 13th International World Wide Web Conference (WWW 2004), pp. 454–455 (2004)

    Google Scholar 

  6. Valle, K., Ozturk, P.: Graph-based Representations for Text Classification. India-Norway Workshop on Web Concepts and Technologies, Trondheim, Norway (2011)

    Google Scholar 

  7. Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Proceedings of EMNLP, pp. 404–411 (2004)

    Google Scholar 

  8. Wan, X., Xiao, J.: Single Document Keyphrase Extraction using Neighborhood Knowledge. In: The Twenty-Third AAAI Conference on Artificial Intelligence, pp. 855–860 (2008)

    Google Scholar 

  9. Tsatsaronis, G., Varlamis, I., Nørvåg, K.: An Experimental Study on Unsupervised Graph-based Word Sense Disambiguation. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 184–198. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  10. Seretan, V.: Syntax-based Collocation Extraction. Springer, New York (2011)

    Book  MATH  Google Scholar 

  11. Seretan, V., Wehrli, E.: Multilingual Collocation Extraction with a Syntactic Parser. Language Resources and Evaluation 43(1), 71–85 (2007)

    Article  Google Scholar 

  12. Yarowsky, D.: One Sense Per Collocation. In: Proceedings of ARPA Human Language Technology Workshop, pp. 266–271. Princeton (1993)

    Google Scholar 

  13. Padó, S., Lapata, M.: Dependency-based Construction of Semantic Space Models. Computational Linguistics, 161–199 (2007)

    Google Scholar 

  14. Williams, G.: In Search of Representativity in Specialised Corpora: Categorisation through Collocation. International Journal of Corpus Linguistics 7, 43–64 (2002)

    Article  Google Scholar 

  15. Ferret, O.: Using Collocations for Topic Segmentation and Link Detection. In: Proceedings of the 19th International Conference on Computational linguistics (COLING 2002), pp. 260–266 (2002)

    Google Scholar 

  16. Seretan, V.: A Collocation-Driven Approach to Text Summarization. In: TALN 2011, pp. 9–14 (2011)

    Google Scholar 

  17. Wehrli, E., Seretan, V., Nerima, L., Russo, L.: Collocations in a Rule-Based MT System: A Case Study Evaluation of Their Translation Adequacy. In: The 13th Annual Conference of the EAMT, pp. 128–135 (2009)

    Google Scholar 

  18. Nerima, L., Wehrli, E., Seretan, V.: A Recursive Treatment of Collocations. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC 2010), pp. 634–638 (2010)

    Google Scholar 

  19. Arazy, O., Woo, C.: Enhancing Information Retrieval through Statistical Natural Language Processing: A Study of Collocation Indexing. MIS Quarterly 525–546 (2007)

    Google Scholar 

  20. Jin, P., Wu, Y., Yu, S.: SemEval-2007 Task 5: Multilingual Chinese-English Lexical Sample. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval 2007), pp. 19–23 (2007)

    Google Scholar 

  21. Li, W.: Chinese Collocation Extraction and its Application in Natural Language Processing. Doctor of Philosophy:The Hong Kong Polythenic University (2007)

    Google Scholar 

  22. Padó, S., Lapata, M.: Dependency-based Construction of Semantic Space Models. Computational Linguistics, 161–199 (2007)

    Google Scholar 

  23. Seretan, V.: Syntax-based Collocation Extraction. Springer, New York (2011)

    Book  MATH  Google Scholar 

  24. Firth, J.R.: Papers in Linguistics 1934 - 1951. Oxford University Press, London (1957)

    Google Scholar 

  25. Wermter, J.: Collocation and Term Extraction Using Linguistically Enhanced Statistical Methods. Doctor of Philosophy: der Friedrich-Schiller-Universität Jena (2008)

    Google Scholar 

  26. Smadja, F.A.: Retrieving Collocations from Text: Xtract. Computational Linguistics 19(1), 143–177 (1993)

    Google Scholar 

  27. Halliday, M.A.: Lexis as Linguistic Level. In: Charles, E.B., John, C.C., Michael, A.K., Robbins, R.H. (eds.) Memory of F. R. Firth, pp. 148–162. Longman, Harlow (1966)

    Google Scholar 

  28. Sinclair, J.: Beginning the Study of Lexis. In: Charles, E., Bazell, J.C., Catford, H., Michael, A.K., Robbins, R.H. (eds.) Memory of F. R. Firth, Longman, London (1966)

    Google Scholar 

  29. Church, K., Hanks, P.: Word Association Norms, Mutual Information and Lexicography. Computational Linguistics 16, 22–29 (1990)

    Google Scholar 

  30. Church, K., Gale, W.A., Hanks, P., Hindle, D.: Using Statistics in Lexical Analysis. In: Uri, Z. (ed.) Lexical Acquisition. Using Online Resources to Build a Lexicon, pp. 115–164. Lawrence Erlbaum Associates, Hillsdale (1991)

    Google Scholar 

  31. Church, K.W.: One Term or Two? In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1995, pp. 310–318. ACM Press, Seattle (1995)

    Google Scholar 

  32. Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction Of Terminology. In: Klavans, J.L., Resnik, P. (eds.) The Balancing Act: Combining Statistical and Symbolic Approaches to Language, pp. 49–66. MIT Press, Cambridge (1996)

    Google Scholar 

  33. Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. Bradford Book & MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  34. Evert, S., Krenn, B.: Methods for the Qualitative Evaluation of Lexical Association Measures. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics and the 10th Conference of the European Chapter of the Association for Computational Linguistics, pp. 188–195. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  35. Evert, B.: The Statistics of Word Co-occurrences: Word Pairs and Collocations. University of Stuttgart, Doctor of Philosophy (2005)

    Google Scholar 

  36. Zhao, H., Zhang, X., Kit, C.: Integrative Semantic Dependency Parsing via Efficient Large-scale Feature Selection. Journal of Artificial Intelligence Research 46, 203–233 (2013)

    MathSciNet  Google Scholar 

  37. Culotta, A., Sorensen, J.: Dependency Tree Kernels for Relation Extraction. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, pp. 423–429 (2004)

    Google Scholar 

  38. Ding, Y., Palmer, M.: Synchronous Dependency Insertion Grammars: A Grammar Formalism for Syntax Based Statistical MT. In: Proceedings of the Workshop on Recent Advances in Dependency Grammar, Geneva, pp. 90–97 (2004)

    Google Scholar 

  39. Quirk, C., Menezes, A., Cherry, C.: Dependency Treelet Translation: Syntactically Informed Phrasal SMT. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, MI, pp. 271–279 (2005)

    Google Scholar 

  40. Johansson, R., Nugues, P.: Dependency-based Semantic Role Labeling of PropBank. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 69–78. Association for Computational Linguistics, Honolulu (2008)

    Google Scholar 

  41. Nivre, J.: Dependency Parsing. Language and Linguistics Compass 4/3, 138–152 (2010)

    Article  Google Scholar 

  42. Fillmore, C.J.: Frame Semantics and the Nature of Language. In: Annals of the New York Academy of Sciences: Conference on the Origin and Development of Language and Speech, pp. 20–32 (1976)

    Google Scholar 

  43. Palmer, M., Gildea, D., Kingsbury, P.: The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 71–105 (2005)

    Google Scholar 

  44. Kipper, K., Dang, H.T., Palmer, M.: Class-based Construction of a Verb Lexicon. In: AAAI/IAAI, pp. 691–696 (2000)

    Google Scholar 

  45. Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., et al.: The NomBank Project: An Interim Report. In: A.M. (ed.) HLT-NAACL 2004 Workshop:Frontiers in Corpus Annotation, Boston, MA, pp. 24–31 (2004)

    Google Scholar 

  46. Buchholz, S., Marsi, E., Dubey, A., Krymolowski, Y.: CoNLL-X Shared Task on Multilingual Dependency Parsing. In: Proceeding of the 10th Conference on Computational Natural Language Learning, CoNLL-2006 (2006)

    Google Scholar 

  47. Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., et al.: MaltParser: A Language-independent System for Data-driven Dependency Parsing. Natural Language Engineering 13(2), 95–135 (2007)

    Google Scholar 

  48. Surdeanu, M., Johansson, R., Meyers, A., Màrquez, L., Nivre, J.: The CoNLL-2008 Shared Task on Joint Parsing Of Syntactic And Semantic Dependencies. In: Proceedings of the 12th Conference on Computational Natural Language Learning (CoNLL-2008), Manchester, pp. 159–177 (2008)

    Google Scholar 

  49. Hajic, J., Ciaramita, M., Johansson, R., Kawahara, D., Marti, M.A., M‘arquez, L., et al.: The CoNLL 2009 Shared Task: Syntactic and Semantic Dependencies inMultiple Languages. In: Proceedings of the 13th CoNLL-2009, Boulder, Colorado, USA, pp. 1–18 (2009)

    Google Scholar 

  50. McDonald, R., Crammer, K., Pereira, F.: Online Large-margin Training of Dependency Parsers. In: Proceedings of ACL-2005 (2005)

    Google Scholar 

  51. Johansson, R., Nugues, P.: The Effect of Syntactic Representation on Semantic Role Labeling. In: Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008), pp. 393–400 (2008)

    Google Scholar 

  52. Siaw, N.H., Narayanan, K., Bali, R.-M., Jane, L.: Nested Event Model. SoMet: The 13th International Conference on Intelligent Software Methodologies Tools and Techniques, Langkawi, Malaysia, September 22-24 (in press, 2014)

    Google Scholar 

  53. Grimshaw, J.: Argument Structure. MIT Press, Cambridge (1990)

    Google Scholar 

  54. Valle, K.: Graph-based Representations for Textual Case-Based Reasoning. Master Thesis. Norwegian University of Science and Technology (2011)

    Google Scholar 

  55. Green, A.: Kappa Statistics for Multiple Raters using Categorical Classifications. In: Proceedings of the 22 Annual SAS User Group International Conference, pp. 16–19 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Hiong, S.N., Ranaivo-Malançon, B., Kulathuramaiyer, N., Labadin, J. (2014). Linguistically Enhanced Collocate Words Model. In: Jaafar, A., et al. Information Retrieval Technology. AIRS 2014. Lecture Notes in Computer Science, vol 8870. Springer, Cham. https://doi.org/10.1007/978-3-319-12844-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12844-3_20

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12843-6

  • Online ISBN: 978-3-319-12844-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics