Abstract
Word sense discrimination is to group occurrences of a word into clusters based on unsupervised classification method, where each cluster consists of occurrences having same meaning. Feature extraction method has been used to reduce the dimension of context vector in English word sense discrimination task. But if original dimension has a real meaning to users and relevant features exist in original dimensions, feature selection is a better choice for finding relevant features. In this paper we apply two unsupervised feature selection schemes to Chinese character sense discrimination, which are entropy based feature filter and Minimum Description Length based feature wrapper. Using precision evaluation and known ground-truth classification result, our preliminary experiment results demonstrate that feature selection method performs better than feature extraction method on Chinese character sense discrimination task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bouman, A.C., Shapiro, M., Cook, W.G., Atkins, B.C., Cheng, H.: Cluster: An Unsupervsied Algorithm for Modeling Gaussian Mixtures (1998), http://dynamo.ecn.purdue.edu/~bouman/software/cluster/
Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature Selection for Clustering – A Filter Solution. In: Proc. IEEE Int. Conf. on Data Mining, Maebashi City, Japan (2002)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Devaney, M., Ram, A.: Efficient Feature Selection in Conceptual Clustering. In: Proc. 14th Int. Conf. on Machine Learning, pp. 92–97. Morgan Kaufmann, San Francisco (1997)
Dy, J.G., Brodley, C.E.: Feature Subset Selection and Order Identification for Unsupervised Learning. In: Proc. 17th Int. Conf. on Machine Learning, pp. 247–254. Morgan Kaufmann, San Francisco (2000)
Iannarilli, F.J., Rubin, P.A.: Feature Selection for Multiclass Discrimination via Mixed-Integer Linear Programming. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(6), 779–783 (2003)
Kohavi, R., John, G.H.: Wrappers for Feature Subset Selection. Artificial Intelligence Journal: Special Issue on Relevance, 273–324 (1997)
Lange, T., Braun, M., Roth, V., Buhmann, J.M.: Stability-Based Model Selection. In: Advances in Neural Information Processing Systems, vol. 15 (2002)
Law, M.H., Figueiredo, M., Jain, A.K.: Feature Selection in Mixture-Based Clustering. In: Advances in Neural Information Processing Systems, vol. 15 (2002)
Mihalcea, R.: Instance Based Learning with Automatic Feature Selection Applied to Word Sense Disambiguation. In: Proceedings of the 19th International Conference on Computational Linguistics, Taiwan (2002)
Mitra, P., Murthy, A.C., Pal, K.S.: Unsupervised Feature Selection Using Feature Similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(4), 301–312 (2002)
Modha, D.S., Spangler, W.S.: Feature Weighting in k-Means Clustering. Machine Learning 52(3), 217–237 (2003)
Narendra, P., Fukunaga, K.: A Branch and Bound Algorithm for Feature Subset Selection. IEEE Transactions on Computers 26(9), 917–922 (1977)
Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognigion Letters 15, 1119–1125 (1994)
Schütze, H.: Automatic Word Sense Discrimination. Computational Linguistics 24(1), 97–123 (1998)
Siedlecki, W., Sklansky, J.: A note on genetic algorithms for large scale on feature selection. Pattern Recognition Letters 10, 335–347 (1989)
Vaithyanathan, S., Dom, B.: Generalized Model Selection For Unsupervised Learning in High Dimensions. In: Advances in Neural Information Processing Systems, vol. 12, pp. 970–976 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Niu, ZY., Ji, DH. (2004). Feature Selection for Chinese Character Sense Discrimination. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-24630-5_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive