Skip to main content

Feature Selection for Chinese Character Sense Discrimination

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2945))

Abstract

Word sense discrimination is to group occurrences of a word into clusters based on unsupervised classification method, where each cluster consists of occurrences having same meaning. Feature extraction method has been used to reduce the dimension of context vector in English word sense discrimination task. But if original dimension has a real meaning to users and relevant features exist in original dimensions, feature selection is a better choice for finding relevant features. In this paper we apply two unsupervised feature selection schemes to Chinese character sense discrimination, which are entropy based feature filter and Minimum Description Length based feature wrapper. Using precision evaluation and known ground-truth classification result, our preliminary experiment results demonstrate that feature selection method performs better than feature extraction method on Chinese character sense discrimination task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bouman, A.C., Shapiro, M., Cook, W.G., Atkins, B.C., Cheng, H.: Cluster: An Unsupervsied Algorithm for Modeling Gaussian Mixtures (1998), http://dynamo.ecn.purdue.edu/~bouman/software/cluster/

  2. Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature Selection for Clustering – A Filter Solution. In: Proc. IEEE Int. Conf. on Data Mining, Maebashi City, Japan (2002)

    Google Scholar 

  3. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  4. Devaney, M., Ram, A.: Efficient Feature Selection in Conceptual Clustering. In: Proc. 14th Int. Conf. on Machine Learning, pp. 92–97. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  5. Dy, J.G., Brodley, C.E.: Feature Subset Selection and Order Identification for Unsupervised Learning. In: Proc. 17th Int. Conf. on Machine Learning, pp. 247–254. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  6. Iannarilli, F.J., Rubin, P.A.: Feature Selection for Multiclass Discrimination via Mixed-Integer Linear Programming. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(6), 779–783 (2003)

    Article  Google Scholar 

  7. Kohavi, R., John, G.H.: Wrappers for Feature Subset Selection. Artificial Intelligence Journal: Special Issue on Relevance, 273–324 (1997)

    Google Scholar 

  8. Lange, T., Braun, M., Roth, V., Buhmann, J.M.: Stability-Based Model Selection. In: Advances in Neural Information Processing Systems, vol. 15 (2002)

    Google Scholar 

  9. Law, M.H., Figueiredo, M., Jain, A.K.: Feature Selection in Mixture-Based Clustering. In: Advances in Neural Information Processing Systems, vol. 15 (2002)

    Google Scholar 

  10. Mihalcea, R.: Instance Based Learning with Automatic Feature Selection Applied to Word Sense Disambiguation. In: Proceedings of the 19th International Conference on Computational Linguistics, Taiwan (2002)

    Google Scholar 

  11. Mitra, P., Murthy, A.C., Pal, K.S.: Unsupervised Feature Selection Using Feature Similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(4), 301–312 (2002)

    Article  Google Scholar 

  12. Modha, D.S., Spangler, W.S.: Feature Weighting in k-Means Clustering. Machine Learning 52(3), 217–237 (2003)

    Article  MATH  Google Scholar 

  13. Narendra, P., Fukunaga, K.: A Branch and Bound Algorithm for Feature Subset Selection. IEEE Transactions on Computers 26(9), 917–922 (1977)

    Article  MATH  Google Scholar 

  14. Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognigion Letters 15, 1119–1125 (1994)

    Article  Google Scholar 

  15. Schütze, H.: Automatic Word Sense Discrimination. Computational Linguistics 24(1), 97–123 (1998)

    Google Scholar 

  16. Siedlecki, W., Sklansky, J.: A note on genetic algorithms for large scale on feature selection. Pattern Recognition Letters 10, 335–347 (1989)

    Article  MATH  Google Scholar 

  17. Vaithyanathan, S., Dom, B.: Generalized Model Selection For Unsupervised Learning in High Dimensions. In: Advances in Neural Information Processing Systems, vol. 12, pp. 970–976 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Niu, ZY., Ji, DH. (2004). Feature Selection for Chinese Character Sense Discrimination. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24630-5_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21006-1

  • Online ISBN: 978-3-540-24630-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics