Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 191))

  • 1419 Accesses

Abstract

We present a novel semi-supervised incremental approach for discovering word categories, sets of words sharing a significant aspect of distributional context. We utilize high frequency words as seed in order to capture semantic information in form of symmetric similarity of word pair, lexical category is then created based on a new clustering algorithm proposed recently: affinity propagation (AP). Furthermore, we assess the performance using a new measure we proposed that meets three criteria: informativeness, diversity and purity. The quantitative and qualitative evaluation show that this semi-supervised incremental approach is plausible for induction of lexical categories from distributional data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Redington, M., Crater, N., Finch, S.: Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science: A Multidisciplinary Journal (1998)

    Google Scholar 

  2. Mintz, T.: Category induction from distributional cues in an artificial language. Memory and Cognition 30(5) (2002)

    Google Scholar 

  3. Davidov, D., Rappoport, A.: Efficient Unsupervised Discovery of Word Categories using Symmetric Patterns and High Frequency Words. In: COLING-ACL 2006 (2006)

    Google Scholar 

  4. Frey, J., Dueck, D.: Clustering by Passing Messages Between Data Points. Science 315, 972–976 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  5. Parisien, C., Fazly, A., Stevenson, S.: An incremental bayesian model for learning syntactic categories. In: Proceedings of the Twelfth Conference on Computational Natural Language Learning (2008)

    Google Scholar 

  6. Frank, S., Goldwater, S., Keller, F.: Evaluating models of syntactic category acquisition without using a gold standard. In: Proceedings of the 31st Annual Meeting of the Cognitive Science Society (2009)

    Google Scholar 

  7. Chrupala, G., Alishahi, A.: Online Entropy-based Model of Lexical Category Acquisition. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning (2010)

    Google Scholar 

  8. Brown, P., Mercer, R., Della Pietra, V., Lai, J.: Class-based n-gram models of natural language. Computational Linguistics 18(4), 467–479 (1992)

    Google Scholar 

  9. Clark, A.: Inducing syntactic categories by context distribution clustering. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, pp. 91–94 (2000)

    Google Scholar 

  10. Tomasello, M.: Acquiring linguistic constructions. In: Siegler, R., Kuhn, D. (eds.) Handbook of Child Psychology: Cognition, Perception and Language. Wiley Publishers (2006) (in Press)

    Google Scholar 

  11. Yu, S., Duan, H., Zhu, S., Swen, B., Chang, B.: Specification for corpus processing at Peking University: Word segmentation, POS tagging and phonetic notation. Journal of Chinese Language and Computing 13(2), 121–158 (2003)

    Google Scholar 

  12. Mei, J., Zhu, Y., Gao, Y., Yin, H.: TongYiCi CiLin, ShangHai DianShu ChuBanShe (1983)

    Google Scholar 

  13. MacWhinney, B.: The CHILDES project: Tools for analyzing talk. Lawrence Erlbaum Associates Inc., US (2000)

    Google Scholar 

  14. Mitchell, T.: The discipline of machine learning (Technical Report CMUML-06-108). Carnegie Mellon University (2006)

    Google Scholar 

  15. Tenenbaum, J.B., Kemp, C., Griffiths, T.L., Goodman, N.D.: How to Grow a Mind: Statistics, Structure, and Abstraction. Science 331(6022), 1279–1285 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  16. Davidov, D., Rappoport, A.: Efficient Unsupervised Discovery of Word Categories using Symmetric Patterns and High Frequency Words. In: COLING-ACL 2006 (2006)

    Google Scholar 

  17. Alishahi, A., Chrupała, G.: Lexical category acquisition as an incremental process. In: Proceedings of the CogSci 2009 Workshop on Psycho Computational Models of Human Language Acquisition, Amsterdam (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bichuan Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, B., Wang, X. (2013). Semi-supervised Incremental Model for Lexical Category Acquisition. In: Du, Z. (eds) Proceedings of the 2012 International Conference of Modern Computer Science and Applications. Advances in Intelligent Systems and Computing, vol 191. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33030-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33030-8_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33029-2

  • Online ISBN: 978-3-642-33030-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics