Skip to main content

Exploiting the Bin-Class Histograms for Feature Selection on Discrete Data

  • Conference paper
  • First Online:
Pattern Recognition and Image Analysis (IbPRIA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9117))

Included in the following conference series:

Abstract

In machine learning and pattern recognition tasks, the use of feature discretization techniques may have several advantages. The discretized features may hold enough information for the learning task at hand, while ignoring minor fluctuations that are irrelevant or harmful for that task. The discretized features have more compact representations that may yield both better accuracy and lower training time, as compared to the use of the original features. However, in many cases, mainly with medium and high-dimensional data, the large number of features usually implies that there is some redundancy among them. Thus, we may further apply feature selection (FS) techniques on the discrete data, keeping the most relevant features, while discarding the irrelevant and redundant ones. In this paper, we propose relevance and redundancy criteria for supervised feature selection techniques on discrete data. These criteria are applied to the bin-class histograms of the discrete features. The experimental results, on public benchmark data, show that the proposed criteria can achieve better accuracy than widely used relevance and redundancy criteria, such as mutual information and the Fisher ratio.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.gems-system.org.

  2. 2.

    www.cs.waikato.ac.nz/ml/weka.

References

  1. Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, Morgan Kauffmann, Amsterdam (2005)

    Google Scholar 

  2. Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction, Foundations and Applications. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  3. Garcia, S., Luengo, J., Saez, J., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)

    Article  Google Scholar 

  4. Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification Learning. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 1022–1027 (1993)

    Google Scholar 

  5. Kurgan, L., Cios, K.: CAIM discretization algorithm. IEEE Trans. Know. Data Eng. 16(2), 145–153 (2004)

    Article  Google Scholar 

  6. Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, New York (2001)

    MATH  Google Scholar 

  7. Cover, T., Thomas, J.: Elements of Information Theory. Wiley, New York (1991)

    Book  MATH  Google Scholar 

  8. Brown, G., Pocock, A., Zhao, M., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, 27–66 (2012)

    MATH  MathSciNet  Google Scholar 

  9. Franay, B., Doquire, G., Verleysen, M.: Theoretical and empirical study on the potential inadequacy of mutual information for feature selection in classification. Neurocomputing 112, 64–78 (2013)

    Article  Google Scholar 

  10. Ferreira, A., Figueiredo, M.: Efficient feature selection filters for high-dimensional data. Pattern Recogn. Lett. 33(13), 1794–1804 (2012)

    Article  Google Scholar 

  11. Srebro, N., Shraibman, A.: Rank, trace-norm and max-norm. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS (LNAI), vol. 3559, pp. 545–560. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  12. Strang, G., Borre, K.: Linear Algebra, Geodesy, and GPS. Cambridge Press, Wellesley (1997)

    MATH  Google Scholar 

  13. Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://archive.ics.uci.edu/ml

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Artur J. Ferreira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ferreira, A.J., Figueiredo, M.A.T. (2015). Exploiting the Bin-Class Histograms for Feature Selection on Discrete Data. In: Paredes, R., Cardoso, J., Pardo, X. (eds) Pattern Recognition and Image Analysis. IbPRIA 2015. Lecture Notes in Computer Science(), vol 9117. Springer, Cham. https://doi.org/10.1007/978-3-319-19390-8_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19390-8_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19389-2

  • Online ISBN: 978-3-319-19390-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics