Skip to main content

Construction Grammar Based Annotation Framework for Parsing Tamil

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9623))

Abstract

Syntactic parsing in NLP is the task of working out the grammatical structure of sentences. Some of the purely formal approaches to parsing such as phrase structure grammar, dependency grammar have been successfully employed for a variety of languages. While phrase structure based constituent analysis is possible for fixed order languages such as English, dependency analysis between the grammatical units have been suitable for many free word order languages. These approaches rely on identifying the linguistic units based on their formal syntactic properties and establishing the relationships between such units in the form of a tree. Instead, we characterize every morphosyntactic unit as a mapping between form and function on the lines of Construction Grammar and parsing as identification of dependency relations between such conceptual units. Our approach to parser annotation shows an average MALT LAS score of 82.21% on Tamil gold annotated corpus of 935 sentences in a five-fold validation experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Hindi, Telugu and Bangla accuracies reported are experimented with Computational Paninian Grammar Framework [7]. The Tamil accuracy reported is based on the Universal Treebank results reported by Straka et al. [15].

  2. 2.

    Indian Language Machine Translation Project funded by DIT, Government of India.

  3. 3.

    The gold annotation was carried out by AU-KBC Research Centre, Chennai.

  4. 4.

    http://www.maltparser.org/download.html.

References

  1. Goldberg, A.E.: Construction Grammar. Wiley Online Library (2002)

    Google Scholar 

  2. Fried, M., Östman, J.O.: Construction grammar. In: Construction Grammar in a Cross-Language Perspective (2011)

    Google Scholar 

  3. Langacker, R.W.: Cognitive Grammar: A Basic Introduction. Oxford University Press, Oxford (2008)

    Book  Google Scholar 

  4. Shieber, S.M.: Evidence against the context-freeness of natural language. In: Savitch, W.J., Bach, E., Marsh, W., Safran-Naveh, G. (eds.) The Formal Complexity of Natural Language. Studies in Linguistics and Philosophy, vol. 33, pp. 320–334. Springer, Heidelberg (1985). https://doi.org/10.1007/978-94-009-3401-6_12

    Chapter  Google Scholar 

  5. Melčuk, I.A.: Dependency Syntax: Theory and Practice. SUNY Press, Albany (1988)

    Google Scholar 

  6. Bharati, A., Chaitanya, V., Sangal, R., Ramakrishnamacharyulu, K.: Natural Language Processing: A Paninian Perspective. Prentice-Hall of India, New Delhi (1995)

    Google Scholar 

  7. Bharati, A., Sangal, R.: Parsing free word order languages in the Paninian framework. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, pp. 105–111. Association for Computational Linguistics (1993)

    Google Scholar 

  8. Bharati, A., Gupta, M., Yadav, V., Gali, K., Sharma, D.M.: Simple parser for Indian languages in a dependency framework. In: Proceedings of the Third Linguistic Annotation Workshop, pp. 162–165. Association for Computational Linguistics (2009)

    Google Scholar 

  9. Mannem, P.: Bidirectional dependency parser for Hindi, Telugu and Bangla. In: Proceedings of NLP Tools Contest: Indian Language Dependency Parsing, ICON 2009, India (2009)

    Google Scholar 

  10. Nivre, J.: Parsing Indian languages with MaltParser. In: Proceedings of the NLP Tools Contest: Indian Language Dependency Parsing, ICON 2009, pp. 12–18 (2009)

    Google Scholar 

  11. Ambati, B.R., Gadde, P., Jindal, K.: Experiments in Indian language dependency parsing. In: Proceedings of the NLP Tools Contest: Indian Language Dependency Parsing, ICON 2009, pp. 32–37 (2009)

    Google Scholar 

  12. Antony, P., Warrier, N.J., Soman, K.: Penn treebank-based syntactic parsers for South Dravidian languages using a machine learning approach. Int. J. Comput. Appl. 7, 14–21 (2010)

    Google Scholar 

  13. Selvam, M., Natarajan, A., Thangarajan, R.: Structural parsing of natural language text in Tamil using phrase structure hybrid language model. Int. J. Comput. Inf. Syst. Sci. Eng. 2008, 2–4 (2008)

    Google Scholar 

  14. Ramasamy, L., Žabokrtský, Z.: Tamil dependency parsing: results using rule based and corpus based approaches. In: Gelbukh, A.F. (ed.) CICLing 2011. LNCS, vol. 6608, pp. 82–95. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19400-9_7

    Chapter  Google Scholar 

  15. Straka, M., Hajic, J., Straková, J., Hajic Jr., J.: Parsing universal dependency treebanks using neural networks and search-based oracle. In: International Workshop on Treebanks and Linguistic Theories (TLT 2014), p. 208 (2014)

    Google Scholar 

  16. Kumari, B.V.S., Rao, R.R.: Hindi dependency parsing using a combined model of Malt and MST. In: 24th International Conference on Computational Linguistics, p. 171. Citeseer (2012)

    Google Scholar 

  17. Kesidi, S.R., Kosaraju, P., Vijay, M., Husain, S.: A constraint based hybrid dependency parser for Telugu. Int. J. Comput. Linguist. Appl. 2, 53 (2011)

    Google Scholar 

  18. Seddah, D., Tsarfaty, R., Kübler, S., Candito, M., Choi, J., Farkas, R., Foster, J., Goenaga, I., Gojenola, K., Goldberg, Y., et al.: Overview of the SPMRL 2013 shared task: cross-framework evaluation of parsing morphologically rich languages. Association for Computational Linguistics (2013)

    Google Scholar 

  19. Amritavalli, R., Jayaseelan, K.: Finiteness and negation in Dravidian. In: The Oxford Handbook of Comparative Syntax, pp. 178–220 (2005)

    Google Scholar 

  20. Amritavalli, R.: Separating tense and finiteness: anchoring in Dravidian. Nat. Lang. Linguist. Theory 32, 283–306 (2014)

    Article  Google Scholar 

  21. McFadden, T., Sundaresan, S.: Finiteness in south Asian languages: an introduction. Nat. Lang. Linguist. Theory 32, 1–27 (2014)

    Article  Google Scholar 

  22. Jayaseelan, K.A.: The serial verb construction in Malayalam. In: Dayal, V., Mahajan, A. (eds.) Clause Structure in South Asian Languages. Studies in Natural Language and Linguistic Theory, vol. 61, pp. 67–91. Springer, Heidelberg (2004). https://doi.org/10.1007/978-1-4020-2719-2_3

    Chapter  Google Scholar 

  23. Jayaseelan, K.: Coordination, relativization and finiteness in Dravidian. Nat. Lang. Linguist. Theory 32, 191–211 (2014)

    Article  Google Scholar 

  24. Herring, S.C.: Aspect as a discourse category in Tamil. In: Annual Meeting of the Berkeley Linguistics Society, vol. 14 (2011)

    Google Scholar 

  25. Karmakar, S., Kasturirangan, R.: Cognitive processes underlying the meaning of complex predicates and serial verbs from the perspective of individuating and ordering situations in bānlā. In: Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia, pp. 81–87. ACM (2010)

    Google Scholar 

  26. Bharati, A., Husain, D.S.S., Bai, L., Begam, R., Sangal, R.: Anncorra: Treebanks for Indian languages, guidelines for annotating Hindi treebank (version-2.0) (2009)

    Google Scholar 

  27. Ambati, B.R., Husain, S., Nivre, J., Sangal, R.: On the role of morphosyntactic features in Hindi dependency parsing. In: Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, pp. 94–102. Association for Computational Linguistics (2010)

    Google Scholar 

  28. Szabolcsi, A.: What do quantifier particles do? Linguist. Philos. 38, 159–204 (2015)

    Article  Google Scholar 

  29. Bharati, A., Sangal, R., Sharma, D.M., Bai, L.: Anncorra: Annotating corpora guidelines for POS and chunk annotation for Indian languages. LTRC-TR31 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vigneshwaran Muralidaran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Muralidaran, V., Misra Sharma, D. (2018). Construction Grammar Based Annotation Framework for Parsing Tamil. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75477-2_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75476-5

  • Online ISBN: 978-3-319-75477-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics