Exploiting the Bin-Class Histograms for Feature Selection on Discrete Data

Ferreira, Artur J.; Figueiredo, Mário A. T.

doi:10.1007/978-3-319-19390-8_39

Artur J. Ferreira^16,18 &
Mário A. T. Figueiredo^17,18

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9117))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

3931 Accesses
2 Citations

Abstract

In machine learning and pattern recognition tasks, the use of feature discretization techniques may have several advantages. The discretized features may hold enough information for the learning task at hand, while ignoring minor fluctuations that are irrelevant or harmful for that task. The discretized features have more compact representations that may yield both better accuracy and lower training time, as compared to the use of the original features. However, in many cases, mainly with medium and high-dimensional data, the large number of features usually implies that there is some redundancy among them. Thus, we may further apply feature selection (FS) techniques on the discrete data, keeping the most relevant features, while discarding the irrelevant and redundant ones. In this paper, we propose relevance and redundancy criteria for supervised feature selection techniques on discrete data. These criteria are applied to the bin-class histograms of the discrete features. The experimental results, on public benchmark data, show that the proposed criteria can achieve better accuracy than widely used relevance and redundancy criteria, such as mutual information and the Fisher ratio.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
www.gems-system.org.
2.
www.cs.waikato.ac.nz/ml/weka.

References

Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, Morgan Kauffmann, Amsterdam (2005)
Google Scholar
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction, Foundations and Applications. Springer, Heidelberg (2006)
MATH Google Scholar
Garcia, S., Luengo, J., Saez, J., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)
Article Google Scholar
Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification Learning. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 1022–1027 (1993)
Google Scholar
Kurgan, L., Cios, K.: CAIM discretization algorithm. IEEE Trans. Know. Data Eng. 16(2), 145–153 (2004)
Article Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, New York (2001)
MATH Google Scholar
Cover, T., Thomas, J.: Elements of Information Theory. Wiley, New York (1991)
Book MATH Google Scholar
Brown, G., Pocock, A., Zhao, M., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, 27–66 (2012)
MATH MathSciNet Google Scholar
Franay, B., Doquire, G., Verleysen, M.: Theoretical and empirical study on the potential inadequacy of mutual information for feature selection in classification. Neurocomputing 112, 64–78 (2013)
Article Google Scholar
Ferreira, A., Figueiredo, M.: Efficient feature selection filters for high-dimensional data. Pattern Recogn. Lett. 33(13), 1794–1804 (2012)
Article Google Scholar
Srebro, N., Shraibman, A.: Rank, trace-norm and max-norm. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS (LNAI), vol. 3559, pp. 545–560. Springer, Heidelberg (2005)
Chapter Google Scholar
Strang, G., Borre, K.: Linear Algebra, Geodesy, and GPS. Cambridge Press, Wellesley (1997)
MATH Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://archive.ics.uci.edu/ml

Download references

Author information

Authors and Affiliations

Instituto Superior de Engenharia de Lisboa, Lisboa, Portugal
Artur J. Ferreira
Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
Mário A. T. Figueiredo
Instituto de Telecomunicações, Lisboa, Portugal
Artur J. Ferreira & Mário A. T. Figueiredo

Authors

Artur J. Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Mário A. T. Figueiredo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Artur J. Ferreira .

Editor information

Editors and Affiliations

Universitat Politècnica de València, València, Spain
Roberto Paredes
Universidade do Porto, Porto, Portugal
Jaime S. Cardoso
Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Xosé M. Pardo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferreira, A.J., Figueiredo, M.A.T. (2015). Exploiting the Bin-Class Histograms for Feature Selection on Discrete Data. In: Paredes, R., Cardoso, J., Pardo, X. (eds) Pattern Recognition and Image Analysis. IbPRIA 2015. Lecture Notes in Computer Science(), vol 9117. Springer, Cham. https://doi.org/10.1007/978-3-319-19390-8_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-19390-8_39
Published: 09 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19389-2
Online ISBN: 978-3-319-19390-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics