Feature Selection for Chinese Character Sense Discrimination

Niu, Zheng-Yu; Ji, Dong-Hong

doi:10.1007/978-3-540-24630-5_24

Zheng-Yu Niu⁵ &
Dong-Hong Ji⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2945))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

945 Accesses
1 Citations

Abstract

Word sense discrimination is to group occurrences of a word into clusters based on unsupervised classification method, where each cluster consists of occurrences having same meaning. Feature extraction method has been used to reduce the dimension of context vector in English word sense discrimination task. But if original dimension has a real meaning to users and relevant features exist in original dimensions, feature selection is a better choice for finding relevant features. In this paper we apply two unsupervised feature selection schemes to Chinese character sense discrimination, which are entropy based feature filter and Minimum Description Length based feature wrapper. Using precision evaluation and known ground-truth classification result, our preliminary experiment results demonstrate that feature selection method performs better than feature extraction method on Chinese character sense discrimination task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bouman, A.C., Shapiro, M., Cook, W.G., Atkins, B.C., Cheng, H.: Cluster: An Unsupervsied Algorithm for Modeling Gaussian Mixtures (1998), http://dynamo.ecn.purdue.edu/~bouman/software/cluster/
Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature Selection for Clustering – A Filter Solution. In: Proc. IEEE Int. Conf. on Data Mining, Maebashi City, Japan (2002)
Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Devaney, M., Ram, A.: Efficient Feature Selection in Conceptual Clustering. In: Proc. 14th Int. Conf. on Machine Learning, pp. 92–97. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Dy, J.G., Brodley, C.E.: Feature Subset Selection and Order Identification for Unsupervised Learning. In: Proc. 17th Int. Conf. on Machine Learning, pp. 247–254. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Iannarilli, F.J., Rubin, P.A.: Feature Selection for Multiclass Discrimination via Mixed-Integer Linear Programming. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(6), 779–783 (2003)
Article Google Scholar
Kohavi, R., John, G.H.: Wrappers for Feature Subset Selection. Artificial Intelligence Journal: Special Issue on Relevance, 273–324 (1997)
Google Scholar
Lange, T., Braun, M., Roth, V., Buhmann, J.M.: Stability-Based Model Selection. In: Advances in Neural Information Processing Systems, vol. 15 (2002)
Google Scholar
Law, M.H., Figueiredo, M., Jain, A.K.: Feature Selection in Mixture-Based Clustering. In: Advances in Neural Information Processing Systems, vol. 15 (2002)
Google Scholar
Mihalcea, R.: Instance Based Learning with Automatic Feature Selection Applied to Word Sense Disambiguation. In: Proceedings of the 19th International Conference on Computational Linguistics, Taiwan (2002)
Google Scholar
Mitra, P., Murthy, A.C., Pal, K.S.: Unsupervised Feature Selection Using Feature Similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(4), 301–312 (2002)
Article Google Scholar
Modha, D.S., Spangler, W.S.: Feature Weighting in k-Means Clustering. Machine Learning 52(3), 217–237 (2003)
Article MATH Google Scholar
Narendra, P., Fukunaga, K.: A Branch and Bound Algorithm for Feature Subset Selection. IEEE Transactions on Computers 26(9), 917–922 (1977)
Article MATH Google Scholar
Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognigion Letters 15, 1119–1125 (1994)
Article Google Scholar
Schütze, H.: Automatic Word Sense Discrimination. Computational Linguistics 24(1), 97–123 (1998)
Google Scholar
Siedlecki, W., Sklansky, J.: A note on genetic algorithms for large scale on feature selection. Pattern Recognition Letters 10, 335–347 (1989)
Article MATH Google Scholar
Vaithyanathan, S., Dom, B.: Generalized Model Selection For Unsupervised Learning in High Dimensions. In: Advances in Neural Information Processing Systems, vol. 12, pp. 970–976 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore, 119613, Republic of Singapore
Zheng-Yu Niu & Dong-Hong Ji

Authors

Zheng-Yu Niu
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Hong Ji
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Niu, ZY., Ji, DH. (2004). Feature Selection for Chinese Character Sense Discrimination. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_24

Download citation

DOI: https://doi.org/10.1007/978-3-540-24630-5_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics