Abstract
This paper presents Kenmore, a general framework for knowledge acquisition for natural language processing (NLP) systems. To ease the acquisition of knowledge in new domains, Kenmore exploits an online corpus using robust sentence analysis and embedded symbolic machine learning techniques while requiring only minimal human intervention. By treating all problems in ambiguity resolution as classification tasks, the framework uniformly addresses a range of subproblems in sentence analysis, each of which traditionally had required a separate computational mechanism. In a series of experiments, we demonstrate the successful use of Kenmore for learning solutions to several problems in lexical and structural ambiguity resolution. We argue that the learning and knowledge acquisition components should be embedded components of the NLP system in that (1) learning should take place within the larger natural language understanding system as it processes text, and (2) the learning components should be evaluated in the context of practical language-processing tasks.
Preview
Unable to display preview. Download preview PDF.
References
D. Aha, D. Kibler, and M. Albert. Instance-Based Learning Algorithms. Machine Learning, 6(1):37–66, 1991.
Chinatsu Aone and William Bennett. Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies. in Proceedings of the 33rd Annual Meeting of the ACL, pages 122–129. Association for Computational Linguistics, 1995.
A. van den Bosch and W. Daelemans. Data-oriented methods for grapheme-to-phoneme conversion. In Proceedings of European Chapter of ACL, pages 45–53, Utrecht, 1993. Also available as ITK Research Report 42.
E. Brill. Some Advances in Transformation-Based Part of Speech Tagging. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 722–727. AAAI Press/MIT Press, 1994.
C. Cardie. Corpus-Based Acquisition of Relative Pronoun Disambiguation Heuristics. In Proceedings of the 30th Annual Meeting of the ACL, pages 216–223, University of Delaware, Newark, DE, 1992. Association for Computational Linguistics.
C. Cardie. Learning to Disambiguate Relative Pronouns. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 38–43, San Jose, CA, 1992. AAAI Press/MIT Press.
C. Cardie. Using Decision Trees to Improve Case-Based Learning. In P. Utgoff, editor, Proceedings of the Tenth International Conference on Machine Learning, pages 25–32, University of Massachusetts, Amherst, MA, 1993. Morgan Kaufmann.
C. Cardie. Domain-Specific Knowledge Acquisition for Conceptual Sentence Analysis. PhD thesis, University of Massachusetts, Amherst, MA, 1994. Available as University of Massachusetts, CMPSCI Technical Report 94-74.
E. Charniak. Equations for Part-of-Speech Tagging. In Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 784–789, Washington, DC, 1993. AAAI Press / MIT Press.
T. Chen, V. Soo, and A. Lin. Learning to Parse with Recurrent Neural Networks. In Proceedings of European Conference on Machine Learning Workshop on Machine Learning and Text Analysis, pages 63–68, 1993.
N. Chinchor, L. Hirschman, and D. Lewis. Evaluating Message Understanding Systems: An Analysis of the Third Message Undestanding Conference (MUC-3). Computational Linguistics, 19(3):409–449, 1993.
K. Church. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In Proceedings of the Second Conference on Applied Natural Language Processing, pages 136–143. Association for Computational Linguistics, 1988.
W. Daelemans, G. Durieux, and S. Gillis. The Acquisition of Stress: A Data-Oriented Approach. Computational Linguistics, 20(3):421–451, 1994.
D. Fisher. Knowledge Acquisition Via Incremental Conceptual Clustering. Machine Learning, 2:139–172, 1987.
W. Lehnert. Symbolic/Subsymbolic Sentence Analysis: Exploiting the Best of Two Worlds. In J. Barnden and J. Pollack, editors, Advances in Connectionist and Neural Computation Theory, pages 135–164. Ablex Publishers, Norwood, NJ, 1990.
W. Lehnert, J. McCarthy, S. Soderland, E. Riloff, C. Cardie, J. Peterson, F. Feng, C. Dolan, and S. Goldman. University of Massachusetts/Hughes: Description of the CIRCUS System as Used in MUC-5. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 277–291, San Mateo, CA, 1993. Morgan Kaufmann.
W. Lehnert and B. Sundheim. A performance evaluation of text analysis technologies. Artificial Intelligence Magazine, 12(3):81–94, 1991.
Diane J. Litman and Rebecca J. Passonneau. Combining Multiple Knowledge Sources for Discourse Segmentation. In Proceedings of the 33rd Annual Meeting of the ACL, pages 108–115. Association for Computational Linguistics, 1995.
Joseph F. McCarthy and Wendy G. Lehnert. Using Decision Trees for Coreference Resolution. In C. Mellish, editor, Proceedings of the Fourteenth International Conference on Artificial Intelligence, pages 1050–1055, 1995.
Proceedings of the Third Message Understanding Conference (MUC-3). Morgan Kaufmann, San Mateo, CA, 1991.
Proceedings of the Fifth Message Understanding Conference (MUC-5). Morgan Kaufmann, San Mateo, CA, 1994.
J. R. Quinlan. Learning Logical Definitions from Relations. Machine Learning, 5:239–266, 1990.
J. R. Quinlan. C4-5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1992.
P. Utgoff. An Improved Algorithm for Incremental Induction of Decision Trees. In W. Cohen and H. Hirsh, editors, Proceedings of the Eleventh International Conference on Machine Learning, pages 318–325, Rutgers University, New Brunswick, NJ, 1994. Morgan Kaufmann.
A. J. Viterbi. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory, 13:260–269, 1967.
S. Wermter. Combining Symbolic and Connectionist Techniques for Coordination in Natural Language. In Proceedings of the 14th German Workshop on Artificial Intelligence, Eringerfeld, Germany, 1990.
S. Wermter and W. Lehnert. A hybrid symbolic/connectionist model for nounphrase understanding. Connection Science, 1(3), 1989.
David Yarowsky. Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French. In Proceedings of the 32th Annual Meeting of the ACL, 1994.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cardie, C. (1996). Embedded machine learning systems for natural language processing: A general framework. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_56
Download citation
DOI: https://doi.org/10.1007/3-540-60925-3_56
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60925-4
Online ISBN: 978-3-540-49738-7
eBook Packages: Springer Book Archive