Abstract
We present a novel semi-supervised incremental approach for discovering word categories, sets of words sharing a significant aspect of distributional context. We utilize high frequency words as seed in order to capture semantic information in form of symmetric similarity of word pair, lexical category is then created based on a new clustering algorithm proposed recently: affinity propagation (AP). Furthermore, we assess the performance using a new measure we proposed that meets three criteria: informativeness, diversity and purity. The quantitative and qualitative evaluation show that this semi-supervised incremental approach is plausible for induction of lexical categories from distributional data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Redington, M., Crater, N., Finch, S.: Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science: A Multidisciplinary Journal (1998)
Mintz, T.: Category induction from distributional cues in an artificial language. Memory and Cognition 30(5) (2002)
Davidov, D., Rappoport, A.: Efficient Unsupervised Discovery of Word Categories using Symmetric Patterns and High Frequency Words. In: COLING-ACL 2006 (2006)
Frey, J., Dueck, D.: Clustering by Passing Messages Between Data Points. Science 315, 972–976 (2007)
Parisien, C., Fazly, A., Stevenson, S.: An incremental bayesian model for learning syntactic categories. In: Proceedings of the Twelfth Conference on Computational Natural Language Learning (2008)
Frank, S., Goldwater, S., Keller, F.: Evaluating models of syntactic category acquisition without using a gold standard. In: Proceedings of the 31st Annual Meeting of the Cognitive Science Society (2009)
Chrupala, G., Alishahi, A.: Online Entropy-based Model of Lexical Category Acquisition. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning (2010)
Brown, P., Mercer, R., Della Pietra, V., Lai, J.: Class-based n-gram models of natural language. Computational Linguistics 18(4), 467–479 (1992)
Clark, A.: Inducing syntactic categories by context distribution clustering. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, pp. 91–94 (2000)
Tomasello, M.: Acquiring linguistic constructions. In: Siegler, R., Kuhn, D. (eds.) Handbook of Child Psychology: Cognition, Perception and Language. Wiley Publishers (2006) (in Press)
Yu, S., Duan, H., Zhu, S., Swen, B., Chang, B.: Specification for corpus processing at Peking University: Word segmentation, POS tagging and phonetic notation. Journal of Chinese Language and Computing 13(2), 121–158 (2003)
Mei, J., Zhu, Y., Gao, Y., Yin, H.: TongYiCi CiLin, ShangHai DianShu ChuBanShe (1983)
MacWhinney, B.: The CHILDES project: Tools for analyzing talk. Lawrence Erlbaum Associates Inc., US (2000)
Mitchell, T.: The discipline of machine learning (Technical Report CMUML-06-108). Carnegie Mellon University (2006)
Tenenbaum, J.B., Kemp, C., Griffiths, T.L., Goodman, N.D.: How to Grow a Mind: Statistics, Structure, and Abstraction. Science 331(6022), 1279–1285 (2011)
Davidov, D., Rappoport, A.: Efficient Unsupervised Discovery of Word Categories using Symmetric Patterns and High Frequency Words. In: COLING-ACL 2006 (2006)
Alishahi, A., Chrupała, G.: Lexical category acquisition as an incremental process. In: Proceedings of the CogSci 2009 Workshop on Psycho Computational Models of Human Language Acquisition, Amsterdam (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, B., Wang, X. (2013). Semi-supervised Incremental Model for Lexical Category Acquisition. In: Du, Z. (eds) Proceedings of the 2012 International Conference of Modern Computer Science and Applications. Advances in Intelligent Systems and Computing, vol 191. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33030-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-33030-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33029-2
Online ISBN: 978-3-642-33030-8
eBook Packages: EngineeringEngineering (R0)