Skip to main content

Unsupervised Induction of Persian Semantic Verb Classes Based on Syntactic Information

  • Conference paper
Book cover Language Processing and Intelligent Information Systems (IIS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7912))

Included in the following conference series:

  • 1086 Accesses

Abstract

Automatic induction of semantic verb classes is one of the most challenging tasks in computational lexical semantics with a wide variety of applications in natural language processing. The large number of Persian speakers and the lack of such semantic classes for Persian verbs have motivated us to use unsupervised algorithms for Persian verb clustering. In this paper, we have done experiments on inducing the semantic classes of Persian verbs based on Levin’s theory for verb classes. Syntactic information extracted from dependency trees is used as base features for clustering the verbs. Since there has been no manual classification of Persian verbs prior to this paper, we have prepared a manual classification of 265 verbs into 43 semantic classes. We show that spectral clustering algorithm outperforms KMeans and improves on the baseline algorithm with about 17% in Fmeasure and 0.13 in Rand index.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Shamsfard, M.: Challenges and open problems in Persian text processing. In: 5th Language & Technology Conference (LTC): Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznań, Poland, pp. 65–69 (2011)

    Google Scholar 

  2. Rasooli, M.S., Kashefi, O., Minaei-Bidgoli, B.: Effect of adaptive spell checking in Persian. In: 7th International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), pp. 161–164 (2011)

    Google Scholar 

  3. Karimi-Doostan, G.: Lexical categories in Persian. Lingua 121(2), 207–220 (2011)

    Article  Google Scholar 

  4. Karimi-Doostan, G.: Separability of light verb constructions in Persian. Studia Linguistica 65(1), 70–95 (2011)

    Article  Google Scholar 

  5. Agirre, E., Bengoetxea, K., Gojenola, K., Nivre, J.: Improving dependency parsing with semantic classes. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL:HLT), Portland, Oregon, USA, pp. 699–703 (June 2011)

    Google Scholar 

  6. Chen, J., Palmer, M.: Improving english verb sense disambiguation performance with linguistically motivated features and clear sense distinction boundaries. Language Resources and Evaluation 43(2), 181–208 (2009)

    Article  Google Scholar 

  7. Korhonen, A.: Semantically motivated subcategorization acquisition. In: Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, Philadelphia, USA, pp. 51–58 (2002)

    Google Scholar 

  8. Titov, I., Klementiev, A.: A Bayesian approach to unsupervised semantic role induction. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Avignon, France, pp. 12–22 (April 2012)

    Google Scholar 

  9. Levin, B.: English verb classes and alternations: A preliminary investigation, vol. 348. University of Chicago press (1993)

    Google Scholar 

  10. Rasooli, M.S., Kouhestani, M., Moloodi, A.: Development of a persian syntactic dependency treebank. In: The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Atlanta, USA (2013)

    Google Scholar 

  11. Forgy, E.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21, 768–769 (1965)

    Google Scholar 

  12. Alpert, C., Kahng, A., Yao, S.: Spectral partitioning with multiple eigenvectors. Discrete Applied Mathematics 90(1), 3–26 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  13. Schulte im Walde, S.: Experiments on the automatic induction of German semantic verb classes. Computational Linguistics 32(2), 159–194 (2006)

    Google Scholar 

  14. Rasooli, M.S., Moloodi, A., Kouhestani, M., Minaei-Bidgoli, B.: A syntactic valency lexicon for Persian verbs: The first steps towards Persian dependency treebank. In: 5th Language & Technology Conference (LTC): Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznań, Poland, pp. 227–231 (2011)

    Google Scholar 

  15. Schulte Im Walde, S.: Clustering verbs semantically according to their alternation behaviour. In: Proceedings of the 18th Conference on Computational Linguistics (COLING), Saarbrücken, Germany, vol. 2, pp.747–753 (2000)

    Google Scholar 

  16. Resnik, P.: Selectional preference and sense disambiguation. In: Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How, Washington DC., USA, pp. 52–57 (1997)

    Google Scholar 

  17. Brew, C.,Schulte im Walde, S.: Spectral clustering for German verbs. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, USA, pp. 117–124 (2002)

    Google Scholar 

  18. Schulte im Walde, S., Brew, C.: Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 223–230 (July 2002)

    Google Scholar 

  19. Sun, L., Korhonen, A., Krymolowski, Y.: Verb class discovery from rich syntactic data. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 16–27. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  20. Sun, L., Korhonen, A., Krymolowski, Y.: Automatic classification of English verbs using rich syntactic features. In: Third International Joint Conference on Natural Language Processing (IJCNLP), Hyderabad, India, pp. 769–774 (2008)

    Google Scholar 

  21. Sun, L., Korhonen, A.: Improving verb clustering with automatically acquired selectional preferences. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), Suntec, Singapore, vol. 2, pp. 638–647 (2009)

    Google Scholar 

  22. Lapata, M., Brew, C.: Verb class disambiguation using informative priors. Computational Linguistics 30(1), 45–73 (2004)

    Article  MATH  Google Scholar 

  23. Korhonen, A., Krymolowski, Y., Marx, Z.: Clustering polysemic subcategorization frame distributions semantically. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL), Sapporo, Japan, vol. 1, pp. 64–71 (2003)

    Google Scholar 

  24. Sun, L., Korhonen, A.: Hierarchical verb clustering using graph factorization. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1023–1033. Association for Computational Linguistics (2011)

    Google Scholar 

  25. Vlachos, A., Korhonen, A., Ghahramani, Z.: Unsupervised and constrained Dirichlet process mixture models for verb clustering. In: Proceedings of the Workshop on Geometrical Models of Natural Language Semantics (GEMS), Athens, Greece, pp. 74–82 (2009)

    Google Scholar 

  26. Saeedi, P., Faili, H.: Feature engineering using shallow parsing in argument classification of Persian verbs. In: Proceedings of the 16th CSI International Symposiums on Artificial Intelligence and Signal Processing (AISP 2012), Shiraz, Iran (2012)

    Google Scholar 

  27. Bijankhan, M.: The role of the corpus in writing a grammar: An introduction to a software. Iranian Journal of Linguistics 19(2) (2004)

    Google Scholar 

  28. Kashefi, O., Nasri, M., Kanani, K.: Automatic Spell Checking in Persian Language. Supreme Council of Information and Communication Technology (SCICT), Tehran, Iran (2010)

    Google Scholar 

  29. Rasooli, M.S., Faili, H., Minaei-Bidgoli, B.: Unsupervised identification of persian compound verbs. In: Batyrshin, I., Sidorov, G. (eds.) MICAI 2011, Part I. LNCS (LNAI), vol. 7094, pp. 394–406. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  30. McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL), Sydney, Australia, pp. 91–98 (2005)

    Google Scholar 

  31. Lee, L.: On the effectiveness of the skew divergence for statistical language analysis. In: Artificial Intelligence and Statistics, vol. 2001, pp. 65–72 (2001)

    Google Scholar 

  32. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems, vol. 2, pp. 849–856 (2002)

    Google Scholar 

  33. Croce, D., Moschitti, A., Basili, R., Palmer, M.: Verb classification using distributional similarity in syntactic and semantic structures. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), Jeju Island, Korea (2012)

    Google Scholar 

  34. Shamsfard, M., Hesabi, A., Fadaei, H., Mansoory, N., Famian, A., Bagherbeigi, S., Fekri, E., Monshizadeh, M., Assi, S.: Semi Automatic Development of FarsNet; the Persian WordNet. In: Proceedings of 5th Global WordNet Conference, Mumbai, India (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aminian, M., Rasooli, M.S., Sameti, H. (2013). Unsupervised Induction of Persian Semantic Verb Classes Based on Syntactic Information. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol 7912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38634-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38634-3_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38633-6

  • Online ISBN: 978-3-642-38634-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics