Skip to main content

Rough Set-Based Feature Selection: Criteria of Max-Dependency, Max-Relevance, and Max-Significance

  • Chapter

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 43))

Abstract

Feature selection is an important data pre-processing step in pattern recognition and data mining. It is effective in reducing dimensionality and redundancy among the selected features, and increasing the performance of learning algorithm and generating information-rich features subset. In this regard, the chapter reports on a rough set-based feature selection algorithm called maximum relevance-maximum significance (MRMS), and its applications on quantitative structure activity relationship (QSAR) and gene expression data. It selects a set of features from a high-dimensional data set by maximizing the relevance and significance of the selected features. A theoretical analysis is reported to justify the use of both relevance and significance criteria for selecting a reduced feature set with high predictive accuracy. The importance of rough set theory for computing both relevance and significance of the features is also established. The performance of the MRMS algorithm, along with a comparison with other related methods, is studied on three QSAR data sets using the R 2 statistic of support vector regression method, and on five cancer and two arthritis microarray data sets by using the predictive accuracy of the K-nearest neighbor rule and support vector machine.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Science, USA 96(12), 6745–6750 (1999)

    Article  Google Scholar 

  2. Amat, L., Besalu, E., Dorca, R.C.: Identification of active molecular sites using quantum-self-similarity matrices. Journal of Chemical Information and Computer Science 41, 978–991 (2001)

    Article  Google Scholar 

  3. Bazan, J., Skowron, A., Synak, P.: Dynamic Reducts as a Tool for Extracting Laws from Decision Tables. In: Raś, Z.W., Zemankova, M. (eds.) ISMIS 1994. LNCS (LNAI), vol. 869, pp. 346–355. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  4. Bjorvand, A., Komorowski, J.: Practical applications of genetic algorithms for efficient reduct computation. In: Proceedings of the 15th IMACS World Congress on Scientific Computation, Modeling and Applied Mathematics, vol. 4, pp. 601–606 (1997)

    Google Scholar 

  5. Bravi, G., Gancia, E., Mascagni, P., Pegna, M., Todeschini, R., Zaliani, A.: MS-WHIM: New 3D theoretical descriptors derived from molecular surface properties: A comparative 3D QSAR study in a series of steroids. Journal of Computer-Aided Molecular Design 11, 79–92 (1997)

    Article  Google Scholar 

  6. Chen, H., Zhou, J., Xie, G.: PARM: A genetic algorithm to predict bioactivity. Journal of Chemical Information and Computer Science 38, 243–250 (1998)

    Article  Google Scholar 

  7. Chen, K.H., Raś, Z.W., Skowron, A.: Attributes and rough properties in information systems. International Journal of Approximate Reasoning 2, 365–376 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chouchoulas, A., Shen, Q.: Rough set-aided keyword reduction for text categorisation. Applied Artificial Intelligence 15, 843–873 (2001)

    Article  Google Scholar 

  9. Cornelis, C., Jensen, R., Martin, G.H., Ślęzak, D.: Attribute selection with fuzzy decision reducts. Information Sciences 180, 209–224 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  10. Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs (1982)

    MATH  Google Scholar 

  11. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification and Scene Analysis. John Wiley and Sons, New York (1999)

    Google Scholar 

  12. Fang, J., Busse, J.W.G.: Mining of MicroRNA Expression Data—A Rough Set Approach. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS (LNAI), vol. 4062, pp. 758–765. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  14. Gordon, G.J., Jensen, R.V., Hsiao, L.L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62, 4963–4967 (2002)

    Google Scholar 

  15. Gruzdz, A., Ihnatowicz, A., Ślęzak, D.: Interactive gene clustering - A case study of breast cancer microarray data. Information Systems Frontiers 8, 21–27 (2006)

    Article  Google Scholar 

  16. Han, J., Kamber, M.: Data Mining, Concepts and Techniques. Morgan Kaufmann Publishers (2001)

    Google Scholar 

  17. Inuiguchi, M., Yoshioka, Y., Kusunoki, Y.: Variable-precision dominance-based rough set approach and attribute reduction. International Journal of Approximate Reasoning 50, 1199–1214 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  18. Jain, A.N., Koile, K., Chapman, D.: Compass: Predicting biological activities from molecular surface properties. Performance comparisons on a steroid benchmark. Journal of Medicinal Chemistry 37, 2315–2327 (1994)

    Article  Google Scholar 

  19. Jensen, R., Shen, Q.: Semantics-preserving dimensionality reduction: Rough and fuzzy-rough-based approach. IEEE Transactions on Knowledge and Data Engineering 16(12), 1457–1471 (2004)

    Article  Google Scholar 

  20. Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370–1386 (2004)

    Article  Google Scholar 

  21. Katritzky, A.R., Lobanov, V., Karelson, M.: Comprehensive descriptors for structural and statistical analysis version 1.1. University of Florida (1994)

    Google Scholar 

  22. Kim, D.: Data classification based on tolerant rough set. Pattern Recognition 34(8), 1613–1624 (2001)

    Article  MATH  Google Scholar 

  23. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)

    Article  MATH  Google Scholar 

  24. Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the International Conference on Machine Learning, pp. 284–292 (1996)

    Google Scholar 

  25. Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough sets: A tutorial. In: Pal, S., Skowron, A. (eds.) Rough-Fuzzy Hybridization: A New Trend in Decision Making, pp. 3–98. Springer, Singapore (1999)

    Google Scholar 

  26. Kudo, Y., Murai, T., Akama, S.: A Granularity-based framework of deduction, induction, and abduction. International Journal of Approximate Reasoning 50(8), 1215–1226 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  27. Leach, A.R.: Molecular Modelling: Principles and Applications, vol. 2. Prentice-Hall (2001)

    Google Scholar 

  28. Li, J., Su, H., Chen, H., Futscher, B.W.: Optimal search-based gene subset selection for gene array cancer classification. IEEE Transactions on Information Technology in Biomedicine 11(4), 398–405 (2007)

    Article  Google Scholar 

  29. Li, Z.R., Han, L.Y., Xue, Y., Yap, C.W., Li, H., Jiang, L., Chen, Y.Z.: MODEL – Molecular descriptor lab: A Web-based server for computing structural and physicochemical features of compounds. Biotechnology and Bioengineering 97, 389–396 (2007)

    Article  Google Scholar 

  30. Liu, S.S., Yin, C.S., Li, Z.L., Cai, S.X.: QSAR study of steroid benchmark and dipeptides based on MEDV-13. Journal of Chemical Information and Computer Science 41, 321–329 (2001)

    Article  Google Scholar 

  31. Liu, X., Krishnan, A., Mondry, A.: An entropy based gene selection method for cancer classification using microarray data. BMC Bioinformatics 6(76), 1–14 (2005)

    Google Scholar 

  32. Maji, P.: f-Information measures for efficient selection of discriminative genes from microarray data. IEEE Transactions on Biomedical Engineering 56(4), 1063–1069 (2009)

    Article  MathSciNet  Google Scholar 

  33. Maji, P., Paul, S.: Rough sets for selection of molecular descriptors to predict biological activity of molecules. IEEE Transactions on System, Man and Cybernetics, Part C, Applications and Reviews 40(6), 639–648 (2010)

    Article  Google Scholar 

  34. Maji, P., Paul, S.: Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data. International Journal of Approximate Reasoning 52(3), 408–426 (2011)

    Article  Google Scholar 

  35. Modrzejewski, M.: Feature selection using rough sets theory. In: Proceedings of the 11th International Conference on Machine Learning, pp. 213–226 (1993)

    Google Scholar 

  36. Napolitano, F., Raiconi, G., Tagliaferri, R., Ciaramella, A., Staiano, A., Miele, G.: Clustering and visualization approaches for human cell cycle gene expression data analysis. International Journal of Approximate Reasoning 47, 70–84 (2008)

    Article  Google Scholar 

  37. van der Pouw Kraan, T.C.T.M., Kraan, T.C.T.M., van Gaalen, F.A., Kasperkovitz, P.V., Verbeet, N.L., Smeets, T.J.M., Kraan, M.C., Fero, M., Tak, P.P., Huizinga, T.W.J., Pieterman, E., Breedveld, F.C., Alizadeh, A.A., Verweij, C.L.: Rheumatoid arthritis is a heterogeneous disease: Evidence for differences in the activation of the STAT-1 pathway between rheumatoid tissues. Arthritis and Rheumatism 48(8), 2132–2145 (2003)

    Article  Google Scholar 

  38. van der Pouw Kraan, T.C.T.M., Wijbrandts, C.A., van Baarsen, L.G.M., Voskuyl, A.E., Rustenburg, F., Baggen, J.M., Ibrahim, S.M., Fero, M., Dijkmans, B.A.C., Tak, P.P., Verweij, C.L.: Rheumatoid arthritis subtypes identified by genomic profiling of peripheral blood cells: Assignment of a type I interferon signature in a subpopulation of pateints. Annals of the Rheumatic Diseases 66, 1008–1014 (2007)

    Article  Google Scholar 

  39. Parthalain, N.M., Shen, Q.: Exploring the boundary region of tolerance rough sets for feature selection. Pattern Recognition 42(5), 655–667 (2009)

    Article  MATH  Google Scholar 

  40. Pawlak, Z.: Rough Sets, Theoretical Aspects of Resoning About Data. Kluwer, Dordrecht (1991)

    Google Scholar 

  41. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  42. Polanski, J., Walczak, B.: The comparative molecular surface analysis (COMSA): a novel tool for molecular design. Computers and Chemistry 24, 615–625 (2000)

    Article  Google Scholar 

  43. Robert, D., Amat, L., Carbo-Dorca, R.: Three-dimensional quantitative structure-activity relationships from tuned molecular quantum similarity measures: Prediction of the corticosteroid-binding globulin binding affinity for a steroid family. Journal of Chemical Information and Computer Sciences 39, 333–344 (1999)

    Article  Google Scholar 

  44. Robinson, D.D., Winn, P., Lyne, P., Richards, W.: Self-organizing molecular field analysis: A tool for structure-activity studies. Journal of Medicinal Chemistry 42, 573–583 (1999)

    Article  Google Scholar 

  45. Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Research 1, 203–209 (2002)

    Google Scholar 

  46. Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Słowiński, R. (ed.) Intelligent Decision Support, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1992)

    Google Scholar 

  47. Skowron, A., Świniarski, R.W., Synak, P.: Approximation Spaces and Information Granulation. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets III. LNCS, vol. 3400, pp. 175–189. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  48. Ślęzak, D.: Approximate reducts in decision tables. In: Proceedings of the 6th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 1996), pp. 1159–1164 (1996)

    Google Scholar 

  49. Ślęzak, D., Wróblewski, J.: Roughfication of Numeric Decision Tables: The Case Study of Gene Expression Data. In: Yao, J., Lingras, P., Wu, W.-Z., Szczuka, M.S., Cercone, N.J., Ślęzak, D. (eds.) RSKT 2007. LNCS (LNAI), vol. 4481, pp. 316–323. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  50. Sventik, V., Wang, T., Tong, C., Liaw, A., Sheridan, R.P., Song, Q.: Boosting: An ensemble learning tool for compound classification and QSAR modeling. Journal of Chemical Information and Modeling 45(3), 786–799 (2005)

    Article  Google Scholar 

  51. Tuppurainen, K., Viisas, M., Laatikainen, R., Peräkylä, M.: Evaluation of a novel electronic eigenvalue (EEVA) molecular descriptor for QSAR/QSPR studies: Validation using a benchmark steroid data set. Journal of Chemical Information and Computer Sciences 42, 607–613 (2002)

    Article  Google Scholar 

  52. Turner, D.B., Willett, P., Ferguson, A.M., Heritage, T.W.: Evaluation of a novel molecular vibration-based descriptor (EVA) for QSAR studies: 2. Model validation using a benchmark steroid dataset. Journal of Computer-Aided Molecular Design 13, 271–296 (1999)

    Article  Google Scholar 

  53. Valdés, J.J., Barton, A.J.: Relevant Attribute Discovery in High Dimensional Data: Application to Breast Cancer Gene Expressions. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS (LNAI), vol. 4062, pp. 482–489. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  54. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    MATH  Google Scholar 

  55. West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J.A., Marks, J.R., Nevins, J.R.: Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Science, USA 98(20), 11,462–11,467 (2001)

    Google Scholar 

  56. Xie, G., Zhang, J., Lai, K., Yu, L.: Variable precision rough set for group decision-making: an application. International Journal of Approximate Reasoning 49, 331–343 (2008)

    Article  MATH  Google Scholar 

  57. Yao, Y.: Probabilistic rough set approximations. International Journal of Approximate Reasoning 49(2), 255–271 (2008)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pradipta Maji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Maji, P., Paul, S. (2013). Rough Set-Based Feature Selection: Criteria of Max-Dependency, Max-Relevance, and Max-Significance. In: Skowron, A., Suraj, Z. (eds) Rough Sets and Intelligent Systems - Professor Zdzisław Pawlak in Memoriam. Intelligent Systems Reference Library, vol 43. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30341-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30341-8_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30340-1

  • Online ISBN: 978-3-642-30341-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics