Abstract
Current approaches to computational lexicology in language technology are knowledge-based (competence-oriented) and try to abstract away from specific formalisms, domains, and applications. This results in severe complexity, acquisition and reusability bottlenecks. As an alternative, we propose a particular performance-oriented approach to Natural Language Processing based on automatic memory-based learning of linguistic (lexical) tasks. The consequences of the approach for computational lexicology are discussed, and the application of the approach on a number of lexical acquisition and disambiguation tasks in phonology, morphology and syntax is described.
I would like to thank my colleagues in the Atila project (Steven Gillis, Gert Durieux, and Antal van den Bosch) for their contributions to the approach described in this paper. The Atila (Antwerp-Tilburg Inductive Language Acquisition) project is a research corporation between the University of Antwerp and Tilburg University focusing on the application of Machine Learning techniques in linguistic engineering and in developmental psycholinguistics. Thanks also to the participants of the Heidelberg workshop on Machine Translation and the Lexicon for useful comments and suggestions.
Preview
Unable to display preview. Download preview PDF.
References
Aha, D.: A study of Instance-Based Algorithms for Supervised Learning Tasks. University of California at Irvine technical report 90–42, 1990.
Aha, D., Kibler, D. and Albert, M.: Instance-Based Learning Algorithms. Machine Learning 6, (1991) 37–66.
Van den Bosch, A. and Daelemans, W.: ‘Data-oriented methods for grapheme-to-phoneme conversion.’ Proceedings of the Sixth conference of the European chapter of the ACL, ACL, (1993) 45–53.
Briscoe, T., de Paiva, V. and Copestake, A.: Inheritance, Defaults and the Lexicon. Cambridge: Cambridge University Press, 1993.
Cost, S. and Salzberg, S.: A weighted nearest neighbour algorithm for learning with symbolic features. Machine Learning 10, (1993) 57–78.
Daelemans, W. and Gazdar, G.: (guest eds.) Special Issue Computational Linguistics on Inheritance in Natural Language Processing, 18 (2) and 18 (3), 1992.
Daelemans, W. and van den Bosch, A.: Generalization Performance of Backpropagation Learning on a Syllabification Task. In: M.F.J. Drossaers and A. Nijholt (eds.) Connectionism and Natural Language Processing. Proceedings Third Twente Workshop on Language Technology, (1992) 27–38.
Daelemans, W. and van den Bosch, A.: ‘A Neural Network for Hyphenation.’ In: I. Aleksander and J. Taylor (eds.) Artificial Neural Networks II: Proceedings of the International Conference on Artificial Neural Networks. Elsevier Science Publishers, (1992) 1647–1650.
Daelemans, W. and van den Bosch, A.: ‘TABTALK: Reusability in Data-oriented grapheme-to-phoneme conversion.’ Proceedings of Eurospeech, Berlin, (1993) 1459–1466.
Daelemans, W., Gillis, S., Durieux, G., van den Bosch, A.: Learnability and Markedness in Data-Driven Acquisition of Stress. In: T. Mark Ellison and James M. Scobbie (eds) Computational Phonology. Edinburgh Working Papers in Cognitive Science 8, (1993) 157–178.
Daelemans, W., Gillis, S., and Durieux, G.: ‘The Acquisition of Stress, a data-oriented approach.’ Computational Linguistics 20 (3), (1994) forthcoming.
Derwing, B. L. and Skousen, R.: Real Time Morphology: Symbolic Rules or Analogical Networks. Berkeley Linguistic Society 15: (1989) 48–62.
Friedman, J., Bentley, J., and Finkel, R., an algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, (1977) 3 (3).
Gillis, S., Daelemans, W., Durieux, G. and van den Bosch, A.: ‘Learnability and Maxkedness: Dutch Stress Assignment.’ In: Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, Boulder Colorado, USA, Hillsdale: Lawrence Erlbaum Associates, (1993) 452–457.
Kira, K. and Rendell, L.: A practical approach to feature selection. Proceedings International Conference on Machine Learning, 1992.
Kitano, H.: Challenges of massive parallelism. Proceedings IJCAI 1993, 813–834.
Kolodner, J.: Case-Based Reasoning. San-Mateo: Morgan-Kaufmann. 1993.
Ling, C.: Learning the past tense of English verbs: The symbolic Pattern Associator vs. Connectionist Models. Journal of Artificial Intelligence Research 1, (1994) 209–229.
Pustejovsky, J.: Dictionary/Lexicon. In: Stuart C. Shapiro (ed.), Encyclopedia of artificial intelligence, New York: Wiley, 1992, 341–365.
Quinlan, J. R.: Induction Of Decision Trees. Machine Learning 1, (1986) 81–106.
Salzberg, S.: A nearest hyperrectangle learning method. Machine Learning 6, (1990) 251–276.
Sejnowski, T. and Rosenberg, C.: NETtalk; a parallel network that learns to read aloud. Complex Systems 1, (1986) 145–168.
Simmons, R. and Yu, Y.: The acquisition and use of context-dependent grammars for English. Computational Linguistics 18 (3) (1992), 391–418.
Smith, E. and Medin, D.: Categories and Concepts. Cambridge, MA: Harvard University Press, 1981.
Skousen, R.: Analogical Modeling of Language. Dordrecht: Kluwer, 1989.
Stanfill, C. and Waltz, D.L.: Toward Memory-based Reasoning. Communications of the ACM (1986) 29: 1213–1228.
Weiss, S. and Kulikowski, C.: Computer systems that learn. San-Mateo: Morgan Kaufmann, 1991.
Winston, P.: Artificial Intelligence. Reading Mass.: Addison-Wesley, 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Daelemans, W. (1995). Memory-based lexical acquisition and processing. In: Steffens, P. (eds) Machine Translation and the Lexicon. WMTL 1993. Lecture Notes in Computer Science, vol 898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-59040-4_22
Download citation
DOI: https://doi.org/10.1007/3-540-59040-4_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-59040-8
Online ISBN: 978-3-540-49174-3
eBook Packages: Springer Book Archive