A Graph Based Method for Building Multilingual Weakly Supervised Dependency Parsers

Gorla, Jagadeesh; Singh, Anil Kumar; Sangal, Rajeev; Gali, Karthik; Husain, Samar; Venkatapathy, Sriram

doi:10.1007/978-3-540-85287-2_15

Jagadeesh Gorla²,
Anil Kumar Singh²,
Rajeev Sangal²,
Karthik Gali²,
Samar Husain² &
…
Sriram Venkatapathy²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

International Conference on Natural Language Processing

1433 Accesses

Abstract

The structure of a sentence can be seen as a spanning tree in a linguistically augmented graph of syntactic nodes. This paper presents an approach for unlabeled dependency parsing based on this view. The first step involves marking the chunks and the chunk heads of a given sentence and then identifying the intra-chunk dependency relations. The second step involves learning to identify the inter-chunk dependency relations. For this, we use an initialization technique based on a measure we call Normalized Conditional Mutual Information (NCMI), in addition to a few linguistic constraints. We present the results for Hindi. We have achieved a precision of 80.83% for sentences of size less than 10 words and 66.71% overall. This is significantly better than the baseline in which random initialization is used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bharati, A., Sangal, R.: Parsing free word order languages using the paninian framework. In: Proceedings of Annual Meeting of Association for Computational Linguistics, pp. 105–111 (1993)
Google Scholar
Bod, R.: An all-subtrees approach to unsupervised parsing. In: Proceedings of COLING-ACL (2006)
Google Scholar
Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the Annual Meeting of the North American Chapter of the Association for Computational Linguistics (ACL) (2000)
Google Scholar
Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and maxent discriminative reranking. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) (2005)
Google Scholar
Chu, Y., Liu, T.: On the shortest arborescence of a directed graph. Science Sinica 14, 1396–1400 (1965)
MathSciNet MATH Google Scholar
Collins, M.: Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania (1999)
Google Scholar
Collins, M., Hajic, J., Brill, E., Ramshaw, L., Tillmann, C.: A statistical parser for czech. In: Proceedings of the 37th Meeting of the Association for Computational Linguistics (ACL), pp. 505–512 (1999)
Google Scholar
Doran, C., Egedi, D., Hockey, B.A., Srinivas, B., Zaidel, M.: Xtag system: a wide coverage grammar for english. In: Proceedings of the 15th conference on Computational linguistics, Morristown, NJ, USA, pp. 922–928. Association for Computational Linguistics (1994)
Google Scholar
Edmonds, J.: Optimum branchings. Journal of Research of the National Bureau of Standards (1967)
Google Scholar
Emeneau, M.B.: India as a linguistic area. Linguistics 32, 3–16 (1956)
Google Scholar
Gao, J., Suzuki, H.: Unsupervised learning of dependency structure for language modeling. In: ACL 2003, pp. 521–528. Association for Computational Linguistics (2003)
Google Scholar
Klein, D.: The Unsupervised Learning of Natural Language Structure. PhD thesis, Stanford University (2004)
Google Scholar
McDonald, R.: Discriminative learning and spanning tree algorithms for dependency parsing. PhD thesis, University of Pennsylvania (2006)
Google Scholar
McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) (2005)
Google Scholar
McDonald, R., Satta, G.: On the complexity of non-projective data-driven dependency parsing. In: Proceedings of the International Conference on Parsing Technologies (IWPT) (2007)
Google Scholar
Nivre, J.: An efficient algorithm for projective dependency parsing. In: Proceedings of International Workshop on Parsing Technologies, pp. 149–160 (2003)
Google Scholar
Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 915–932 (2007)
Google Scholar
Paskin, M.A.: Grammatical bigrams. In: Proceedings of NIPS, pp. 91–97 (2001)
Google Scholar
Avinesh, P.V.S., Karthik, G.: Part-of-speech tagging and chunking using conditional random fields and transformation based learning. In: Proceedings of the IJCAI 2007 Workshop on Shallow Parsing in South Asian Languages, Hyderabad, India (2007)
Google Scholar
Rao, D., Yarowsky, D.: Part of speech tagging and shallow parsing for indian languages. In: Proceedings of the IJCAI-07 Workshop on Shallow Parsing in South Asian Languages, Hyderabad, India (2007)
Google Scholar
Smith, D.A., Eisner, J.: Bootstrapping feature-rich dependency parsers with entropic priors. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 667–677
Google Scholar
Smith, D.A., Smith, N.A.: Probabilistic models of nonprojective dependency trees. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 132–140
Google Scholar
Smith,N.A.: Discovery of linguistic relations using lexical attraction. PhD thesis (1998)
Google Scholar
Yoshinaga, N., Miyao, Y., Torisawa, K., Tsujii, J.: Efficient LTAG parsing using HPSG parsers. In: Proc. of PACLING, pp. 342–351 (2001)
Google Scholar
Yuret,D.: Discovery of linguistic relations using lexical attraction. PhD thesis (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Language Technologies Research Centre, IIIT, Hyderabad, India
Jagadeesh Gorla, Anil Kumar Singh, Rajeev Sangal, Karthik Gali, Samar Husain & Sriram Venkatapathy

Authors

Jagadeesh Gorla
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Sangal
View author publications
You can also search for this author in PubMed Google Scholar
Karthik Gali
View author publications
You can also search for this author in PubMed Google Scholar
Samar Husain
View author publications
You can also search for this author in PubMed Google Scholar
Sriram Venkatapathy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 41296, Göteborg, Sweden
Bengt Nordström & Aarne Ranta &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gorla, J., Singh, A.K., Sangal, R., Gali, K., Husain, S., Venkatapathy, S. (2008). A Graph Based Method for Building Multilingual Weakly Supervised Dependency Parsers. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-85287-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics