Abstract
This paper proposes a framework for handling heterogeneous features containing hierarchical values and texts under Bayesian learning. To exploit hierarchical features, we make use of a statistical technique called shrinkage. We also explore an approach for utilizing text data to improve classification performance. We have evaluated our framework using a yeast gene data set which contain hierarchical features as well as text data.
The work described in this paper was partially supported by grants from the Research Grant Council of the Hong Kong Special Administrative Region, China (Project Nos: CUHK 4385/99E and CUHK 4187/01E).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blaschke, C., Valencia, A.: The frame-based module of the SUISEKI information extraction system. IEEE Intelligent Systems 17(2), 14–19 (2002)
Domingos, P., Pazzani, M.: Beyond independence: Conditions for the optimality of the simple Bayesian classifier. Machine Learning 29, 103–130 (1997)
James, W., Stein, C.: Estimation with quadratic loss. In: Proceedings of the Fourth Berkeley Symposim on Mathematical Statistics and Probability 1, pp. 361–379 (1961)
McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.Y.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the Fourteenth International Conference on Machine Learning ICML, pp. 359–367 (1998)
Segal, E., Koller, D.: Probabilistic hierarchical clustering for biological data. In: Annual Conference on Research in Computational Molecular Biology, pp. 273–280 (2002)
Stein, C.: Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In: Proceedings of the Third Berkley Symposim on Mathematical Statistics and Probability 1, pp. 197–206 (1955)
Tan, P.N., Blau, H., Harp, S., Goldman, R.: Textual data mining of service center call records. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 417–422 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Han, Y., Lam, W. (2003). Exploiting Heterogeneous Features for Classification Learning. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-45080-1_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40550-4
Online ISBN: 978-3-540-45080-1
eBook Packages: Springer Book Archive