Skip to main content

Automatic Detection of Idiomatic Clauses

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

Abstract

We describe several experiments whose goal is to automatically identify idiomatic expressions in written text. We explore two approaches for the task: 1) idiom recognition as outlier detection; and 2) supervised classification of sentences. We apply principal component analysis for outlier detection. Detecting idioms as lexical outliers does not exploit class label information. So, in the following experiments, we use linear discriminant analysis to obtain a discriminant subspace and later use the three nearest neighbor classifier to obtain accuracy. We discuss pros and cons of each approach. All the approaches are more general than the previous algorithms for idiom detection – neither do they rely on target idiom types, lexicons, or large manually annotated corpora, nor do they limit the search space by a particular type of linguistic construction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Birke, J., Sarkar, A.: A clustering approach to the nearly unsupervised recognition of nonliteral language. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, pp. 329–336 (2006)

    Google Scholar 

  • Burnard, L.: The British National Corpus Users Reference Guide. Oxford University Computing Services (2000)

    Google Scholar 

  • Cacciari, C.: The Place of Idioms in a Literal and Metaphorical World. In: Cacciari, C., Tabossi, P. (eds.) Idioms: Processing, Structure, and Interpretation, pp. 27–53. Lawrence Erlbaum Associates (1993)

    Google Scholar 

  • Carletta, J.: Assessing Agreement on Classification Tasks: The Kappa Statistic. Computational Linguistics 22(2), 249–254 (1996)

    Google Scholar 

  • Cilibrasi, R., Vitányi, P.M.B.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)

    Article  Google Scholar 

  • Cohen, J.: A Coefficient of Agreement for Nominal Scales. Education and Psychological Measurement (20), 37–46 (1960)

    Google Scholar 

  • Cook, P., Fazly, A., Stevenson, S.: The VNC-Tokens Dataset. In: Proceedings of the LREC Workshop: Towards a Shared Task for Multiword Expressions (MWE 2008), Marrakech, Morocco (June 2008)

    Google Scholar 

  • Cowie, A.P., Mackin, R., McCaig, I.R.: Oxford Dictionary of Current Idiomatic English, vol. 2. Oxford University Press (1983)

    Google Scholar 

  • Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)

    Book  Google Scholar 

  • Degand, L., Bestgen, Y.: Towards Automatic Retrieval of Idioms in French Newspaper Corpora. Literary and Linguistic Computing 18(3), 249–259 (2003)

    Article  Google Scholar 

  • Fazly, A., Cook, P., Stevenson, S.: Unsupervised Type and Token Identification of Idiomatic Expressions. Computational Linguistics 35(1), 61–103 (2009)

    Article  Google Scholar 

  • Fellbaum, C.: The Ontological Loneliness of Idioms. In: Schalley, A., Zaefferer, D. (eds.) Ontolinguistics. Mouton de Gruyter (2007)

    Google Scholar 

  • Fellbaum, C., Geyken, A., Herold, A., Koerner, F., Neumann, G.: Corpus-based Studies of German Idioms and Light Verbs. International Journal of Lexicography 19(4), 349–360 (2006)

    Article  Google Scholar 

  • Fukunaga, K.: Introduction to statistical pattern recognition. Academic Press (1990)

    Google Scholar 

  • Glucksberg, S.: Idiom Meanings and Allusional Content. In: Cacciari, C., Tabossi, P. (eds.) Idioms: Processing, Structure, and Interpretation, pp. 3–26. Lawrence Erlbaum Associates (1993)

    Google Scholar 

  • Jobson, J.: Applied Multivariate Data Analysis, vol. II: Categorical and Multivariate Methods. Springer (1992)

    Google Scholar 

  • Jolliffe, I.: Principal Component Analysis. Springer, New York (1986)

    Book  Google Scholar 

  • Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: Proceedings of the ACL 2006 Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, Australia, pp. 12–19 (2006)

    Google Scholar 

  • Kendall, M., Stuart, A., Ord, J.: Kendall’s Advanced Theory of Statistics, vol. 1: Distribution Theory. John Wiley and Sons (2009)

    Google Scholar 

  • Krzanowski, W.J.: Principles of Multivariate Analysis. Oxford University Press (2000)

    Google Scholar 

  • Li, L., Sporleder, C.: A Cohesion Graph Based Approach for Unsupervised Recognition of Literal and Non-literal Use of Multiword Expresssions. In: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing (ACL-IJCNLP), Singapore, pp. 75–83 (2009)

    Google Scholar 

  • Li, L., Sporleder, C.: Using Gaussian Mixture Models to Detect Figurative Language in Context. In: Proceedings of NAACL/HLT 2010 (2010)

    Google Scholar 

  • Nunberg, G., Sag, I.A., Wasow, T.: Idioms. Language 70(3), 491–538 (1994)

    Google Scholar 

  • Pado, S., Lapata, M.: Dependency-based construction of semantic space models. Computational Linguistics 33(2), 161–199 (2007)

    Article  MATH  Google Scholar 

  • Peng, J., Feldman, A., Street, L.: Computing linear discriminants for idiomatic sentence detection. Research in Computing Science, Special Issue: Natural Language Processing and its Applications 46, 17–28 (2010)

    Google Scholar 

  • Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  • Seaton, M., Macaulay, A. (eds.): Collins COBUILD Idioms Dictionary, 2nd edn. HarperCollins Publishers (2002)

    Google Scholar 

  • Shyu, M., Chen, S., Sarinnapakorn, K., Chang, L.: A novel anomaly detection scheme based on principal component classifier. In: Proceedings of IEEE International Conference on Data Mining (2003)

    Google Scholar 

  • Sporleder, C., Li, L.: Unsupervised Recognition of Literal and Non-literal Use of Idiomatic Expressions. In: EACL 2009: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 754–762. Association for Computational Linguistics, Morristown (2009)

    Chapter  Google Scholar 

  • Villavicencio, A., Copestake, A., Waldron, B., Lambeau, F.: Lexical Encoding of MWEs. In: Proceedings of the Second ACL Workshop on Multiword Expressions: Integrating Processing, Barcelona, Spain, pp. 80–87 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Feldman, A., Peng, J. (2013). Automatic Detection of Idiomatic Clauses. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37247-6_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37246-9

  • Online ISBN: 978-3-642-37247-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics