Abstract
This paper presents an application of grammatical inference to the task of shallow parsing. We first learn a deterministic probabilistic automaton that models the joint distribution of Chunk (syntactic phrase) tags and Part-of-speech tags, and then use this automaton as a transducer to find the most likely chunk tag sequence using a dynamic programming algorithm. We discuss an efficient means of incorporating lexical information, which automatically identifies particular words that are useful using a mutual information criterion, together with an application of bagging that improve our results. Though the results are not as high as comparable techniques that use models with a fixed structure, the models we learn are very compact and efficient.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Ait-Mokhtar and J.P. Chanod. Incremental finite state parsing. In Proc. of the 5th Conference of Applied Natural Language Processing, pages 72–79, Washington, DC, April 1997.
Thorsten Brants. TnT-a statistical part-of-speech tagger. In Proc. of the 6th Conference on Applied Natural Language Processing, Seattle, WA, April 2000.
Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
Eric Brill. Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics, 21(4):543–565, 1995.
R. Carrasco and J. Oncina. Learning stochastic regular grammars by means of a state merging method. In Second Intl. Collo. on Grammatical Inference and Applications, pages 139–152, 1994.
Joshua Goodman. A bit of progress in language modeling. Technical report, Microsoft Research, 2001.
C. John Henderson and Eric Brill. Bagging and boosting a treebank parser. In NAACL, pages 34–41, Seattle, Washington, USA, 2000.
Christer Johansson. A context sensitive maximum likelihood approach to chunking. In CoNLL-2000 and LLL-2000, pages 136–138, Lisbon, Portugal, 2000.
Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Englewood Cliffs, New Jersey, 2000.
S. M. Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustic, Speech and Signal Processing, 35(3):400–401, 1987.
M.J. Kearns, Y. Mansour, D. Ron, R. Rubinfeld, R.E. Schapire, and L. Sellie. On the learnability of discrete distributions. In Proc. of the 25th Annual ACM Symposium on Theory of Computing, pages 273–282, 1994.
Taku Kudoh and Yuji Matsumoto. Use of support vector learning for chunk identification. In CoNLL-2000 and LLL-2000, pages 142–144, Lisbon, Portugal, 2000.
K. J. Lang, B. A. Pearlmutter, and R. A. Price. Results of the Abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In ICGI’ 98, volume 1433, pages 1–12. Springer-Verlag, 1998.
M. Marcus, S. Santorini, and M. Marcinkiewicz. Building a large annotated corpus of English: the Penn treebank. Computational Linguistics, 19(2):313–330, 1993.
F. Pla, A. Molina, and N. Prieto. An Integrated Statistical Model for Tagging and Chunking Unrestricted Text. In Petr Sojka, Ivan Kopeček, and Karel Pala, editors, Proc. of the Third Intl. Workshop on Text, Speech and Dialogue—TSD 2000, Lecture Notes in Artificial Intelligence LNCS/LNAI 1902, pages 15–20, Brno, Czech Republic, September 2000. Springer-Verlag.
D. Ron, Y. Singer, and N. Tishby. Learning probabilistic automata with variable memory length. In Seventh Conf. on Computational Learning Theory, pages 35–46, New Brunswick, 12–15 July 1994. ACM Press.
D. Ron, Y. Singer, and N. Tishby. On the learnability and usage of acyclic probabilistic finite automata. In COLT 1995, pages 31–40, Santa Cruz CA USA, 1995. ACM.
Erik Tjong Kim Sang. Noun phrase recognition by system combination. In Proceedings of BNAIC’00, pages 335–336. Tilburg, The Netherlands, 2000.
Erik Tjong Kim Sang and Sabine Buchholz. Introduction to the conll-2000 shared task: Chunking. In CoNLL-2000, pages 127–132, Lisbon, Portugal, 2000.
H. Schütze and Y. Singer. Part-of-speech tagging using a variable memory markov model. In Meeting of the Assoc. for Computational Linguistics, pages 181–187, 1994.
A. Stolcke. Bayesian Learning of Probabilistic Language Models. Ph. D. dissertation, University of California, 1994.
F. Thollard. Improving probabilistic grammatical inference core algorithms with post-processing techniques. In Eighth Intl. Conf. on Machine Learning, pages 561–568, Williams, July 2001. Morgan Kauffman.
C. J. van Rijsbergen. Information Retrieval. Butterworths, London, United Kingdom, 1975.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thollard, F., Clark, A. (2002). Shallow Parsing Using Probabilistic Grammatical Inference. In: Adriaans, P., Fernau, H., van Zaanen, M. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2002. Lecture Notes in Computer Science(), vol 2484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45790-9_22
Download citation
DOI: https://doi.org/10.1007/3-540-45790-9_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44239-4
Online ISBN: 978-3-540-45790-9
eBook Packages: Springer Book Archive