Rough Set-Based Feature Selection: Criteria of Max-Dependency, Max-Relevance, and Max-Significance

Maji, Pradipta; Paul, Sushmita

doi:10.1007/978-3-642-30341-8_21

Rough Set-Based Feature Selection: Criteria of Max-Dependency, Max-Relevance, and Max-Significance

Pradipta Maji³ &
Sushmita Paul³

Chapter

1010 Accesses
1 Citations

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 43))

Abstract

Feature selection is an important data pre-processing step in pattern recognition and data mining. It is effective in reducing dimensionality and redundancy among the selected features, and increasing the performance of learning algorithm and generating information-rich features subset. In this regard, the chapter reports on a rough set-based feature selection algorithm called maximum relevance-maximum significance (MRMS), and its applications on quantitative structure activity relationship (QSAR) and gene expression data. It selects a set of features from a high-dimensional data set by maximizing the relevance and significance of the selected features. A theoretical analysis is reported to justify the use of both relevance and significance criteria for selecting a reduced feature set with high predictive accuracy. The importance of rough set theory for computing both relevance and significance of the features is also established. The performance of the MRMS algorithm, along with a comparison with other related methods, is studied on three QSAR data sets using the R ² statistic of support vector regression method, and on five cancer and two arthritis microarray data sets by using the predictive accuracy of the K-nearest neighbor rule and support vector machine.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Science, USA 96(12), 6745–6750 (1999)
Article Google Scholar
Amat, L., Besalu, E., Dorca, R.C.: Identification of active molecular sites using quantum-self-similarity matrices. Journal of Chemical Information and Computer Science 41, 978–991 (2001)
Article Google Scholar
Bazan, J., Skowron, A., Synak, P.: Dynamic Reducts as a Tool for Extracting Laws from Decision Tables. In: Raś, Z.W., Zemankova, M. (eds.) ISMIS 1994. LNCS (LNAI), vol. 869, pp. 346–355. Springer, Heidelberg (1994)
Chapter Google Scholar
Bjorvand, A., Komorowski, J.: Practical applications of genetic algorithms for efficient reduct computation. In: Proceedings of the 15th IMACS World Congress on Scientific Computation, Modeling and Applied Mathematics, vol. 4, pp. 601–606 (1997)
Google Scholar
Bravi, G., Gancia, E., Mascagni, P., Pegna, M., Todeschini, R., Zaliani, A.: MS-WHIM: New 3D theoretical descriptors derived from molecular surface properties: A comparative 3D QSAR study in a series of steroids. Journal of Computer-Aided Molecular Design 11, 79–92 (1997)
Article Google Scholar
Chen, H., Zhou, J., Xie, G.: PARM: A genetic algorithm to predict bioactivity. Journal of Chemical Information and Computer Science 38, 243–250 (1998)
Article Google Scholar
Chen, K.H., Raś, Z.W., Skowron, A.: Attributes and rough properties in information systems. International Journal of Approximate Reasoning 2, 365–376 (1988)
Article MathSciNet MATH Google Scholar
Chouchoulas, A., Shen, Q.: Rough set-aided keyword reduction for text categorisation. Applied Artificial Intelligence 15, 843–873 (2001)
Article Google Scholar
Cornelis, C., Jensen, R., Martin, G.H., Ślęzak, D.: Attribute selection with fuzzy decision reducts. Information Sciences 180, 209–224 (2010)
Article MathSciNet MATH Google Scholar
Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs (1982)
MATH Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification and Scene Analysis. John Wiley and Sons, New York (1999)
Google Scholar
Fang, J., Busse, J.W.G.: Mining of MicroRNA Expression Data—A Rough Set Approach. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS (LNAI), vol. 4062, pp. 758–765. Springer, Heidelberg (2006)
Chapter Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar
Gordon, G.J., Jensen, R.V., Hsiao, L.L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62, 4963–4967 (2002)
Google Scholar
Gruzdz, A., Ihnatowicz, A., Ślęzak, D.: Interactive gene clustering - A case study of breast cancer microarray data. Information Systems Frontiers 8, 21–27 (2006)
Article Google Scholar
Han, J., Kamber, M.: Data Mining, Concepts and Techniques. Morgan Kaufmann Publishers (2001)
Google Scholar
Inuiguchi, M., Yoshioka, Y., Kusunoki, Y.: Variable-precision dominance-based rough set approach and attribute reduction. International Journal of Approximate Reasoning 50, 1199–1214 (2009)
Article MathSciNet MATH Google Scholar
Jain, A.N., Koile, K., Chapman, D.: Compass: Predicting biological activities from molecular surface properties. Performance comparisons on a steroid benchmark. Journal of Medicinal Chemistry 37, 2315–2327 (1994)
Article Google Scholar
Jensen, R., Shen, Q.: Semantics-preserving dimensionality reduction: Rough and fuzzy-rough-based approach. IEEE Transactions on Knowledge and Data Engineering 16(12), 1457–1471 (2004)
Article Google Scholar
Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370–1386 (2004)
Article Google Scholar
Katritzky, A.R., Lobanov, V., Karelson, M.: Comprehensive descriptors for structural and statistical analysis version 1.1. University of Florida (1994)
Google Scholar
Kim, D.: Data classification based on tolerant rough set. Pattern Recognition 34(8), 1613–1624 (2001)
Article MATH Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Article MATH Google Scholar
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the International Conference on Machine Learning, pp. 284–292 (1996)
Google Scholar
Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough sets: A tutorial. In: Pal, S., Skowron, A. (eds.) Rough-Fuzzy Hybridization: A New Trend in Decision Making, pp. 3–98. Springer, Singapore (1999)
Google Scholar
Kudo, Y., Murai, T., Akama, S.: A Granularity-based framework of deduction, induction, and abduction. International Journal of Approximate Reasoning 50(8), 1215–1226 (2009)
Article MathSciNet MATH Google Scholar
Leach, A.R.: Molecular Modelling: Principles and Applications, vol. 2. Prentice-Hall (2001)
Google Scholar
Li, J., Su, H., Chen, H., Futscher, B.W.: Optimal search-based gene subset selection for gene array cancer classification. IEEE Transactions on Information Technology in Biomedicine 11(4), 398–405 (2007)
Article Google Scholar
Li, Z.R., Han, L.Y., Xue, Y., Yap, C.W., Li, H., Jiang, L., Chen, Y.Z.: MODEL – Molecular descriptor lab: A Web-based server for computing structural and physicochemical features of compounds. Biotechnology and Bioengineering 97, 389–396 (2007)
Article Google Scholar
Liu, S.S., Yin, C.S., Li, Z.L., Cai, S.X.: QSAR study of steroid benchmark and dipeptides based on MEDV-13. Journal of Chemical Information and Computer Science 41, 321–329 (2001)
Article Google Scholar
Liu, X., Krishnan, A., Mondry, A.: An entropy based gene selection method for cancer classification using microarray data. BMC Bioinformatics 6(76), 1–14 (2005)
Google Scholar
Maji, P.: f-Information measures for efficient selection of discriminative genes from microarray data. IEEE Transactions on Biomedical Engineering 56(4), 1063–1069 (2009)
Article MathSciNet Google Scholar
Maji, P., Paul, S.: Rough sets for selection of molecular descriptors to predict biological activity of molecules. IEEE Transactions on System, Man and Cybernetics, Part C, Applications and Reviews 40(6), 639–648 (2010)
Article Google Scholar
Maji, P., Paul, S.: Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data. International Journal of Approximate Reasoning 52(3), 408–426 (2011)
Article Google Scholar
Modrzejewski, M.: Feature selection using rough sets theory. In: Proceedings of the 11th International Conference on Machine Learning, pp. 213–226 (1993)
Google Scholar
Napolitano, F., Raiconi, G., Tagliaferri, R., Ciaramella, A., Staiano, A., Miele, G.: Clustering and visualization approaches for human cell cycle gene expression data analysis. International Journal of Approximate Reasoning 47, 70–84 (2008)
Article Google Scholar
van der Pouw Kraan, T.C.T.M., Kraan, T.C.T.M., van Gaalen, F.A., Kasperkovitz, P.V., Verbeet, N.L., Smeets, T.J.M., Kraan, M.C., Fero, M., Tak, P.P., Huizinga, T.W.J., Pieterman, E., Breedveld, F.C., Alizadeh, A.A., Verweij, C.L.: Rheumatoid arthritis is a heterogeneous disease: Evidence for differences in the activation of the STAT-1 pathway between rheumatoid tissues. Arthritis and Rheumatism 48(8), 2132–2145 (2003)
Article Google Scholar
van der Pouw Kraan, T.C.T.M., Wijbrandts, C.A., van Baarsen, L.G.M., Voskuyl, A.E., Rustenburg, F., Baggen, J.M., Ibrahim, S.M., Fero, M., Dijkmans, B.A.C., Tak, P.P., Verweij, C.L.: Rheumatoid arthritis subtypes identified by genomic profiling of peripheral blood cells: Assignment of a type I interferon signature in a subpopulation of pateints. Annals of the Rheumatic Diseases 66, 1008–1014 (2007)
Article Google Scholar
Parthalain, N.M., Shen, Q.: Exploring the boundary region of tolerance rough sets for feature selection. Pattern Recognition 42(5), 655–667 (2009)
Article MATH Google Scholar
Pawlak, Z.: Rough Sets, Theoretical Aspects of Resoning About Data. Kluwer, Dordrecht (1991)
Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)
Article Google Scholar
Polanski, J., Walczak, B.: The comparative molecular surface analysis (COMSA): a novel tool for molecular design. Computers and Chemistry 24, 615–625 (2000)
Article Google Scholar
Robert, D., Amat, L., Carbo-Dorca, R.: Three-dimensional quantitative structure-activity relationships from tuned molecular quantum similarity measures: Prediction of the corticosteroid-binding globulin binding affinity for a steroid family. Journal of Chemical Information and Computer Sciences 39, 333–344 (1999)
Article Google Scholar
Robinson, D.D., Winn, P., Lyne, P., Richards, W.: Self-organizing molecular field analysis: A tool for structure-activity studies. Journal of Medicinal Chemistry 42, 573–583 (1999)
Article Google Scholar
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Research 1, 203–209 (2002)
Google Scholar
Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Słowiński, R. (ed.) Intelligent Decision Support, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1992)
Google Scholar
Skowron, A., Świniarski, R.W., Synak, P.: Approximation Spaces and Information Granulation. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets III. LNCS, vol. 3400, pp. 175–189. Springer, Heidelberg (2005)
Chapter Google Scholar
Ślęzak, D.: Approximate reducts in decision tables. In: Proceedings of the 6th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 1996), pp. 1159–1164 (1996)
Google Scholar
Ślęzak, D., Wróblewski, J.: Roughfication of Numeric Decision Tables: The Case Study of Gene Expression Data. In: Yao, J., Lingras, P., Wu, W.-Z., Szczuka, M.S., Cercone, N.J., Ślęzak, D. (eds.) RSKT 2007. LNCS (LNAI), vol. 4481, pp. 316–323. Springer, Heidelberg (2007)
Chapter Google Scholar
Sventik, V., Wang, T., Tong, C., Liaw, A., Sheridan, R.P., Song, Q.: Boosting: An ensemble learning tool for compound classification and QSAR modeling. Journal of Chemical Information and Modeling 45(3), 786–799 (2005)
Article Google Scholar
Tuppurainen, K., Viisas, M., Laatikainen, R., Peräkylä, M.: Evaluation of a novel electronic eigenvalue (EEVA) molecular descriptor for QSAR/QSPR studies: Validation using a benchmark steroid data set. Journal of Chemical Information and Computer Sciences 42, 607–613 (2002)
Article Google Scholar
Turner, D.B., Willett, P., Ferguson, A.M., Heritage, T.W.: Evaluation of a novel molecular vibration-based descriptor (EVA) for QSAR studies: 2. Model validation using a benchmark steroid dataset. Journal of Computer-Aided Molecular Design 13, 271–296 (1999)
Article Google Scholar
Valdés, J.J., Barton, A.J.: Relevant Attribute Discovery in High Dimensional Data: Application to Breast Cancer Gene Expressions. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS (LNAI), vol. 4062, pp. 482–489. Springer, Heidelberg (2006)
Chapter Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
MATH Google Scholar
West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J.A., Marks, J.R., Nevins, J.R.: Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Science, USA 98(20), 11,462–11,467 (2001)
Google Scholar
Xie, G., Zhang, J., Lai, K., Yu, L.: Variable precision rough set for group decision-making: an application. International Journal of Approximate Reasoning 49, 331–343 (2008)
Article MATH Google Scholar
Yao, Y.: Probabilistic rough set approximations. International Journal of Approximate Reasoning 49(2), 255–271 (2008)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Machine Intelligence Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata, 700 108, India
Pradipta Maji & Sushmita Paul

Authors

Pradipta Maji
View author publications
You can also search for this author in PubMed Google Scholar
Sushmita Paul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pradipta Maji .

Editor information

Editors and Affiliations

Institute of Mathematics, Warsaw University, Banacha 2, Warsaw, 02097, Poland
Andrzej Skowron
Institute of Computer Science, University of Rzeszów, ul. Dekerta 2, Rzeszów, 35-030, Poland
Zbigniew Suraj

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Maji, P., Paul, S. (2013). Rough Set-Based Feature Selection: Criteria of Max-Dependency, Max-Relevance, and Max-Significance. In: Skowron, A., Suraj, Z. (eds) Rough Sets and Intelligent Systems - Professor Zdzisław Pawlak in Memoriam. Intelligent Systems Reference Library, vol 43. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30341-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-30341-8_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30340-1
Online ISBN: 978-3-642-30341-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics