Abstract
In this paper we report on a study on feature selection within the minimum–redundancy maximum–relevance framework. Features are ranked by their correlations to the target vector. These relevance scores are then integrated with correlations between features in order to obtain a set of relevant and least–redundant features. Applied measures of correlation or distributional similarity for redunancy and relevance include Kolmogorov–Smirnov (KS) test, Spearman correlations, Jensen–Shannon divergence, and the sign–test. We introduce a metric called “value difference metric“ (VDM) and present a simple measure, which we call “fit criterion“ (FC). We draw conclusions about the usefulness of different measures. While KS–test and sign–test provided useful information, Spearman correlations are not fit for comparison of data of different measurement intervals. VDM was very good in our experiments as both redundancy and relevance measure. Jensen–Shannon and the sign–test are good redundancy measure alternatives and FC is a good relevance measure alternative.
This research was supported by the Spanish MEC Project “3D Reconstruction, classification and visualization of temporal sequences of bioimplant Micro-CT images“ (MAT-2005-07244-C03-03).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Vyas, V.S., Rege, P.: Automated texture analysis with gabor filters. GVIP Journal 6(1), 35–41 (2006)
Saeys, Y., Inza, I. n., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics (August 24, 2007)
Mundra, P.A., Rajapakse, J.C.: SVM-RFE with Relevancy and Redundancy Criteria for Gene Selection. In: Rajapakse, J.C., Schmidt, B., Volkert, L.G. (eds.) PRIB 2007. LNCS (LNBI), vol. 4774, pp. 242–252. Springer, Heidelberg (2007)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Duch, W., Biesiada, J.: Feature selection for high-dimensional data: A kolmogorov-smirnov correlation-based filter solution. In: Kurzynski, M., Puchala, E., Wozniak, M., Zolnierek, A. (eds.) Advances in Soft Computing, pp. 95–104. Springer, Heidelberg (2005)
Novovicová, J., Malík, A., Pudil, P.: Feature selection using improved mutual information for text classification. In: International Workshop on Structural and Syntactic Pattern Recognition (2004)
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
Knijnenburg, T.A.: Selecting relevant and non-relevant features in microarray classification applications. Master’s thesis, Delft Technical University, Faculty of Electrical Engineering, 2628 CD Delft (2004)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Second IEEE Computational Systems Bioinformatics Conference, pp. 523–529 (2003)
Zhang, Y., Callan, J., Minka, T.: Novelty and redundancy detection in adaptive filtering. In: SIGIR 2002: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 81–88. ACM, New York (2002)
Zhou, J., Peng, H.: Automatic recognition and annotation of gene expression patterns of fly embryos. Bioinformatics 23, 589–596 (2007)
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques (2005)
Liu, X., Krishnan, A., Mondry, A.: An entropy-based gene selection method for cancer classification using microarray data. BMC Bioinformatics 6 (2005)
Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: ICML, pp. 856–863 (2003)
Conover, W., Iman, R.: Rank Transformations as a Bridge Between Parametric and Nonparametric Statistics. AM. STAT. 35, 124–129 (1981)
Wu, G., Twomey, S., Thiers, R.: Statistical Evaluation of Method-Comparison Data. Clinical Chemistry 21, 315–320 (1975)
Stanfill, C., Waltz, D.: Toward memory-based reasoning. Communications of the ACM 29(12), 1213–1228 (1986)
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. Journal of Artificial Intelligence Research 6(6), 1–34 (1997)
Payne, T.R., Edwards, P.: Implicit feature selection with the value difference metric. In: European Conference on Artificial Intelligence, pp. 450–454 (1998)
Lin, J.: Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory 37, 145–151 (1991)
Auffarth, B., López-Sánchez, M., Cerquides, J.: Hopfield Networks in Relevance and Redundancy Feature Selection Applied to Classification of Biomedical High-Resolution Micro-CT Images, Petra Perner (2008)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Burt, P.J., Adelson, E.H.: The laplacian pyramid as a compact image code. IEEE Trans. Communications 31, 532–540 (1983)
Kovesi, P.D.: Edges are not just steps. In: Proceedings of the Fifth Asian Conference on Computer Vision, pp. 822–827 (2002)
Reinagel, P., Zador, A.: Natural scene statistics at center of gaze. Network: Comp. Neural Syst. 10, 341–350 (1999)
Einhäuser, W., Kruse, W., Hoffman, K.P., König, P.: Differences of monkey and human overt attention under natural conditions. Vision Research 46(8-9), 1194–1209 (2006)
Auffarth, B.: Classification of biomedical high-resolution micro-ct images for direct volume rendering. Master’s thesis, University of Barcelona, Barcelona, Spain (2007)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Bollen, K., Bollen, K.: Structural equations with latent variables. Wiley, New York (1989)
Abdi, H.: The Kendall Rank Correlation Coefficient. In: Salkind, N.J. (ed.) Encyclopedia of Measurement and Statistics (2007)
Yilmaz, E., Aslam, J., Robertson, S.: A new rank correlation coefficient for information retrieval. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 587–594. ACM, New York (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Auffarth, B., López, M., Cerquides, J. (2010). Comparison of Redundancy and Relevance Measures for Feature Selection in Tissue Classification of CT Images. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2010. Lecture Notes in Computer Science(), vol 6171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14400-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-14400-4_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14399-1
Online ISBN: 978-3-642-14400-4
eBook Packages: Computer ScienceComputer Science (R0)