Semi-supervised Incremental Model for Lexical Category Acquisition

Zhang, Bichuan; Wang, Xiaojie

doi:10.1007/978-3-642-33030-8_12

Bichuan Zhang² &
Xiaojie Wang²

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 191))

1419 Accesses

Abstract

We present a novel semi-supervised incremental approach for discovering word categories, sets of words sharing a significant aspect of distributional context. We utilize high frequency words as seed in order to capture semantic information in form of symmetric similarity of word pair, lexical category is then created based on a new clustering algorithm proposed recently: affinity propagation (AP). Furthermore, we assess the performance using a new measure we proposed that meets three criteria: informativeness, diversity and purity. The quantitative and qualitative evaluation show that this semi-supervised incremental approach is plausible for induction of lexical categories from distributional data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Redington, M., Crater, N., Finch, S.: Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science: A Multidisciplinary Journal (1998)
Google Scholar
Mintz, T.: Category induction from distributional cues in an artificial language. Memory and Cognition 30(5) (2002)
Google Scholar
Davidov, D., Rappoport, A.: Efficient Unsupervised Discovery of Word Categories using Symmetric Patterns and High Frequency Words. In: COLING-ACL 2006 (2006)
Google Scholar
Frey, J., Dueck, D.: Clustering by Passing Messages Between Data Points. Science 315, 972–976 (2007)
Article MathSciNet MATH Google Scholar
Parisien, C., Fazly, A., Stevenson, S.: An incremental bayesian model for learning syntactic categories. In: Proceedings of the Twelfth Conference on Computational Natural Language Learning (2008)
Google Scholar
Frank, S., Goldwater, S., Keller, F.: Evaluating models of syntactic category acquisition without using a gold standard. In: Proceedings of the 31st Annual Meeting of the Cognitive Science Society (2009)
Google Scholar
Chrupala, G., Alishahi, A.: Online Entropy-based Model of Lexical Category Acquisition. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning (2010)
Google Scholar
Brown, P., Mercer, R., Della Pietra, V., Lai, J.: Class-based n-gram models of natural language. Computational Linguistics 18(4), 467–479 (1992)
Google Scholar
Clark, A.: Inducing syntactic categories by context distribution clustering. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, pp. 91–94 (2000)
Google Scholar
Tomasello, M.: Acquiring linguistic constructions. In: Siegler, R., Kuhn, D. (eds.) Handbook of Child Psychology: Cognition, Perception and Language. Wiley Publishers (2006) (in Press)
Google Scholar
Yu, S., Duan, H., Zhu, S., Swen, B., Chang, B.: Specification for corpus processing at Peking University: Word segmentation, POS tagging and phonetic notation. Journal of Chinese Language and Computing 13(2), 121–158 (2003)
Google Scholar
Mei, J., Zhu, Y., Gao, Y., Yin, H.: TongYiCi CiLin, ShangHai DianShu ChuBanShe (1983)
Google Scholar
MacWhinney, B.: The CHILDES project: Tools for analyzing talk. Lawrence Erlbaum Associates Inc., US (2000)
Google Scholar
Mitchell, T.: The discipline of machine learning (Technical Report CMUML-06-108). Carnegie Mellon University (2006)
Google Scholar
Tenenbaum, J.B., Kemp, C., Griffiths, T.L., Goodman, N.D.: How to Grow a Mind: Statistics, Structure, and Abstraction. Science 331(6022), 1279–1285 (2011)
Article MathSciNet MATH Google Scholar
Davidov, D., Rappoport, A.: Efficient Unsupervised Discovery of Word Categories using Symmetric Patterns and High Frequency Words. In: COLING-ACL 2006 (2006)
Google Scholar
Alishahi, A., Chrupała, G.: Lexical category acquisition as an incremental process. In: Proceedings of the CogSci 2009 Workshop on Psycho Computational Models of Human Language Acquisition, Amsterdam (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Intelligence Science and Technology, Beijing University of Posts and Telecommunications, Beijing, China
Bichuan Zhang & Xiaojie Wang

Authors

Bichuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojie Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bichuan Zhang .

Editor information

Editors and Affiliations

Engineering Research Center, Information Technology & Industrial, No 10 Ruanjianyuanzhong Road, Wuhan, China, People's Republic
Zhenyu Du

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, B., Wang, X. (2013). Semi-supervised Incremental Model for Lexical Category Acquisition. In: Du, Z. (eds) Proceedings of the 2012 International Conference of Modern Computer Science and Applications. Advances in Intelligent Systems and Computing, vol 191. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33030-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-33030-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33029-2
Online ISBN: 978-3-642-33030-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics