Skip to main content

Feature Selection in an Electric Billing Database Considering Attribute Inter-dependencies

  • Conference paper
Book cover Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining (ICDM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4065))

Included in the following conference series:

Abstract

With the increasing size of databases, feature selection has become a relevant and challenging problem for the area of knowledge discovery in databases. An effective feature selection strategy can significantly reduce the data mining processing time, improve the predicted accuracy, and help to understand the induced models, as they tend to be smaller and make more sense to the user. Many feature selection algorithms assumed that the attributes are independent between each other given the class, which can produce models with redundant attributes and/or exclude sets of attributes that are relevant when considered together. In this paper, an effective best first search algorithm, called buBF, for feature selection is described. buBF uses a novel heuristic function based on n-way entropy to capture inter-dependencies among variables. It is shown that buBF produces more accurate models than other state-of-the-art feature selection algorithms when compared on several real and synthetic datasets. Specifically we apply buBF to a Mexican Electric Billing database and obtain satisfactory results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of machine learning research 3, 1157–1182 (2003)

    Article  MATH  Google Scholar 

  2. Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence Journal, Special issue on relevance, 273–324 (1997)

    Google Scholar 

  3. Piramuthu, S.: Evaluating feature selection methods for learning in data mining applications. In: Proc. 31st annual Hawaii Int. conf. on system sciences, pp. 294–301 (1998)

    Google Scholar 

  4. Perner, P., Apté, C.: Empirical Evaluation of Feature Subset Selection Based on a Real-World Data Set. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS, vol. 1910, pp. 575–580. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  5. Molina, L., Belanche, L., Nebot, A.: Feature selection algorithms, a survey and experimental eval. In: IEEE Int. conf. data mining, Maebashi City Japan, pp. 306–313 (2002)

    Google Scholar 

  6. Mitra, S., et al.: Data mining in soft computing framework: a survey. IEEE Trans. on neural networks 13(1), 3–14 (2002)

    Article  Google Scholar 

  7. Narendra, P., Fukunaga, K.: A branch and bound algorithm feature subset selection. IEEE Trans. computers 26(9), 917–922 (1977)

    Article  MATH  Google Scholar 

  8. Yu, B., Yuan, B.: A more efficient branch and bound algorithm for feature selection. Pattern Recognition 26, 883–889 (1993)

    Article  Google Scholar 

  9. Frank, A., Geiger, D., Yakhini, Z.: A distance-B&B feature selection algorithm. In: Procc. Uncertainty in artificial intelligence, México, August 2003, pp. 241–248 (2003)

    Google Scholar 

  10. Somol, P., Pudil, P., Kittler, J.: Fast Branch & bound algorithms for optimal feature selection. IEEE Trans. Pattern Analysis and Machine Intelligence 26(7), 900–912 (2004)

    Article  Google Scholar 

  11. Jakulin, A., Bratko, I.: Testing the significance of attribute interactions. In: Proc. Int. conf. on machine learning, Canada, pp. 409–416 (2004)

    Google Scholar 

  12. Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowledge data engrg. 5(6), 914–925 (1993)

    Article  Google Scholar 

  13. Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)

    MathSciNet  Google Scholar 

  14. www.cs.waikato.ac.nz/ml/weka (2004)

  15. www.ia.uned.es/~elvira/ (2004)

  16. Quinlan, J.R.: Decision trees and multi-valued attributes. In: Hayes, J.E., Michie, D., Richards, J. (eds.) Machine Intelligence, vol. 11, pp. 305–318. Oxford University Press, Oxford (1988)

    Google Scholar 

  17. Liu, M.H., Dash, M.: A monotonic measure for optimal feature selection. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 101–106. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mejía-Lavalle, M., Morales, E.F. (2006). Feature Selection in an Electric Billing Database Considering Attribute Inter-dependencies. In: Perner, P. (eds) Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining. ICDM 2006. Lecture Notes in Computer Science(), vol 4065. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11790853_23

Download citation

  • DOI: https://doi.org/10.1007/11790853_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36036-0

  • Online ISBN: 978-3-540-36037-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics