An Unsupervised Approach for Linking Automatically Extracted and Manually Crafted LTAGs

Faili, Heshaam; Basirat, Ali

doi:10.1007/978-3-642-19400-9_6

Heshaam Faili¹⁷ &
Ali Basirat¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6608))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2189 Accesses

Abstract

Though the lack of semantic representation of automatically extracted LTAGs is an obstacle in using these formalism, due to the advent of some powerful statistical parsers that were trained on them, these grammars have been taken into consideration more than before. Against of this grammatical class, there are some widely usage manually crafted LTAGs that are enriched with semantic representation but suffer from the lack of efficient parsers. The available representation of latter grammars beside the statistical capabilities of former encouraged us in constructing a link between them.

Here, by focusing on the automatically extracted LTAG used by MICA [4] and the manually crafted English LTAG namely XTAG grammar [32], a statistical approach based on HMM is proposed that maps each sequence of former elementary trees onto a sequence of later elementary trees. To avoid of converging the HMM training algorithm in a local optimum state, an EM-based learning process for initializing the HMM parameters were proposed too. Experimental results show that the mapping method can provide a satisfactory way to cover the deficiencies arises in one grammar by the available capabilities of the other.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baldi, P., Chauvin, Y.: Smooth On-Line Learning Algorithms for Hidden Markov Models. Neural Computation 6(2), 307–318 (1994)
Article Google Scholar
Bangalore, S., Joshi, A.: Supertagging: An approach to almost parsing. Computational Linguistics 25(2), 237–266 (1999)
Google Scholar
Bangalore, S., Haffner, P., Emami, G.: Factoring global inference by enriching local representations. Technical report, AT&T Labs – Reserach (2005)
Google Scholar
Bangalore, S., Boulllier, P., Nasr, A., Rambow, O., Sagot, B.: MICA: A probabilistic dependency parser based on Tree Insertion Grammar. In: North American Chapter of the Association for Computational Linguistics, NAACL (2009)
Google Scholar
Baum, L.E.: An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. In: Shisha, O. (ed.) Inequalities III: Proceedings of the Third Symposium on Inequalities, University of California, Los Angeles, pp. 1–8. Academic Press, London (1972)
Google Scholar
Boullier, P., Deschamp, P.: Le système SYNTAX^TM – manuel d’utilisation et de mise en oeuvre sous UNIX^TM (1972), http://syntax.gforge.inria.fr/syntax3.8-manual.pdf
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part of-speech-tagging. Computational Linguistics 21(4), 543–566 (1995)
MathSciNet Google Scholar
Chen, J., Bangalore, S., Vijay-Shanker, K.: New models for improving supertags disambiguation. In: Proc. EACL 1999, Bergen, pp. 188–195 (1999)
Google Scholar
Chen, J.: Towards Efficient Statistical Parsing Using Lexicalized Grammatical Information. Ph.D. thesis, University of Delaware (2001)
Google Scholar
Dang, H., Kipper, K., Palmer, M.: Integrating Compositional Semantic into a Verb Lexicon. In: Proceedings of the Eighteenth International Conference on Computational Linguistic, COLING 2000 (2000)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from in- complete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Faili, H., Ghassem-Sani, G.: An application of Lexicalized Grammars in English-Persian Translation. In: Proceedings of 16^TM European Conference on Artificial Intelligence (ECAI), pp. 596–600 (2004)
Google Scholar
Faili, H.: From partial toward full parsing. In: Recent Advance In Natural Language Processing (RANLP), pp. 71–75 (2009)
Google Scholar
Faili, H., Basirat, A.: Augmenting the Automated Extracted Tree Adjoining Grammars by Semantic Representation. In: The 6th IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE 2010), pp. 584–590 (2010)
Google Scholar
Habash, N., Rambow, O.: Extracting a Tree Adjoining Grammar from the Penn Arabic Treebank. In: Proceedings of Traitement Automatique du Langage Naturel (TALN 2004), Fez, Morocco (2004)
Google Scholar
Hemphill, C.T., Godfrey, J.J., Doddington, G.R.: The atis spoken language systems pilot corpus. In: DARPA Speech and Natural Language Workshop, Hidden Valley (1990)
Google Scholar
Joshi, A.K., Levy, L., Takahashi, M.: Tree Adjunct Grammars. Journal of Computer and System Sciences 10(1), 136–163 (1975)
Article MathSciNet MATH Google Scholar
Joshi, A.K.: How much context-sensitivity is necessary for characterizing structural descriptions? In: Natural Language Processing: Theoretical, Computational, and Psychological Perspectives, pp. 206–250. Cambridge University Press, New York (1985)
Chapter Google Scholar
Kipper, K., Dang, H., Palmer, M.: Class-based Construction of Verb Lexicon. In: Proceedings of Seventh Nation Conference on Artificial Intelligence, AAAI 2000 (2000)
Google Scholar
Makino, T., Yoshida, M., Torisawa, K., Tsujii, J.: LiLFeS-Toward a Practical HPSG Parser. In: Proceeding of COLING-ACL 1998 (1998)
Google Scholar
Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19(2) (1993)
Google Scholar
Murphy, K.: A Hidden Markov Model (HMM) Toolbox for Matlab (1998), http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html
Neumann, G.: Automatic extraction of stochastic lexicalized tree grammars from tree banks. In: Forth International Workshop on TAG and Related Frameworks, TAG+4 (1998)
Google Scholar
Park, J.: Extraction of tree adjoining grammar from a tree bank for Korean. In: Proceedings of the COLING/ACL 2006 Student Research Workshop, pp. 73–78 (2006)
Google Scholar
Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Ryant, N., Kipper, K.: Assigning XTAG trees to VerbNet. In: Seventh International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG+7), pp. 194–198 (May 2004)
Google Scholar
Schabes, Y., Waters, R.C.: Tree Insertion Grammar. Computational Linguistics 21(4) (1995)
Google Scholar
Shen, L., Joshi Aravind, K.: Incremental ltag parsing. In: HLT-EMNLP 2005 (2005)
Google Scholar
Van Noord, G.: Head-corner parsing for TAG. Computational Intelligence 10(4), 525–534 (1994)
Article Google Scholar
Xia, F.: Automatic grammar generation from two different perspectives. Ph D. thesis, University of Pennsylvania (2001)
Google Scholar
Xia, F., Palmer, M.: Evaluating the Coverage of LTAGs on Annotated Corpora. In: Proceeding of Workshop on Using Evaluation within HLD Programs. Results and Trends at Second International Conference on Language Resources and Evaluation, pp. 1–6 (2001)
Google Scholar
XTAG-group. A Lexicalized Tree Adjoining Grammar for English, Technical Report IRCS 98-18, Institute for Research in Cognitive Science, University of Pennsylvania, pp. 5–10 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of ECE, University of Tehran, Tehran, Iran
Heshaam Faili
Department of Computer Engineering, Islamic Azad University, Science and Research, Tehran, Iran
Ali Basirat

Authors

Heshaam Faili
View author publications
You can also search for this author in PubMed Google Scholar
Ali Basirat
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico
Alexander F. Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Faili, H., Basirat, A. (2011). An Unsupervised Approach for Linking Automatically Extracted and Manually Crafted LTAGs. In: Gelbukh, A.F. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19400-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-19400-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19399-6
Online ISBN: 978-3-642-19400-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics