Skip to main content

An Enhanced Univariate Discretization Based on Cluster Ensembles

  • Conference paper
  • First Online:
Intelligent and Evolutionary Systems

Abstract

Most discretization algorithms focus on the univariate case. In general, they take into account the target class or interval-wise frequency of data. In so doing, useful information regarding natural group, hidden pattern and correlation among the attributes may be inevitably lost. In response, this paper introduces a new pruning method that exploits natural groups or clusters as an explicit constraint to traditional cut-point determination techniques. This unsupervised approach makes use of cluster ensembles to reveal similarities between data belonging to adjacent intervals. To be precise, a cut-point between a pair of highly similar or related intervals will be dropped. This pruning mechanism is coupled with three different univariate discretization algorithms, with the evaluation is conducted on 10 datasets and 3 classifier models. The results suggest that the proposed method usually achieve higher classification accuracy levels, than those of the three baseline counterparts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco (2011)

    MATH  Google Scholar 

  2. Sriwanna, K., Puntumapon, K., Waiyamai, K.: An enhanced class-attribute interdependence maximization discretization algorithm. In: Advanced Data Mining and Applications, pp. 465–476. Springer (2012)

    Google Scholar 

  3. Yang, P., Li, J.S., Huang, Y.X.: HDD: a hypercube division-based algorithm for discretisation. International Journal of Systems Science 42(4), 557–566 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bay, S.D.: Multivariate discretization for set mining. Knowledge and Information Systems 3(4), 491–512 (2001)

    Article  MATH  Google Scholar 

  5. Garcia, S., Luengo, J., Sáez, J.A., López, V., Herrera, F.: A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering 25(4), 734–750 (2013)

    Article  Google Scholar 

  6. Sang, Y., Li, K.: Combining univariate and multivariate bottom-up discretization. Journal of Multiple-Valued Logic & Soft Computing 20 (2013)

    Google Scholar 

  7. Kerber, R.: Chimerge: discretization of numeric attributes. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 123–128. Aaai Press (1992)

    Google Scholar 

  8. Liu, H., Setiono, R.: Feature selection via discretization. IEEE Transactions on knowledge and Data Engineering 9(4), 642–645 (1997)

    Article  Google Scholar 

  9. Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Transactions on Knowledge and Data Engineering 16(2), 145–153 (2004)

    Article  Google Scholar 

  10. Ching, J.Y., Wong, A.K., Chan, K.C.C.: Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(7), 641–651 (1995)

    Article  Google Scholar 

  11. Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning (1993)

    Google Scholar 

  12. Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Machine learning EWSL 1991, pp. 164–178. Springer (1991)

    Google Scholar 

  13. Dougherty, J., Kohavi, R., Sahami, M., et al.: Supervised and unsupervised discretization of continuous features. In: ICML, pp. 194–202 (1995)

    Google Scholar 

  14. Kang, Y., Wang, S., Liu, X., Lai, H., Wang, H., Miao, B.: An ica-based multivariate discretization algorithm. In: Knowledge Science, Engineering and Management, pp. 556–562. Springer (2006)

    Google Scholar 

  15. Gupta, A., Mehrotra, K.G., Mohan, C.: A clustering-based discretization for supervised learning. Statistics & Probability Letters 80(9), 816–824 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  16. Parashar, A., Gulati, Y.: Survey of different partition clustering algorithms and their comparative studies. International Journal of Advanced Research in Computer Science 3(3) (2012)

    Google Scholar 

  17. Singh, G.K., Minz, S.: Discretization using clustering and rough set theory. In: International Conference on Computing: Theory and Applications, ICCTA 2007, pp. 330–336. IEEE (2007)

    Google Scholar 

  18. Kuncheva, L., Hadjitodorov, S.T., et al.: Using diversity in cluster ensembles. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 2, pp. 1214–1219. IEEE (2004)

    Google Scholar 

  19. Cano, A., Nguyen, D., Ventura, S., Cios, K.: ur-CAIM: improved caim discretization for unbalanced and balanced data. Soft Computing, 1–16 (2014)

    Google Scholar 

  20. Tsai, C.J., Lee, C.I., Yang, W.P.: A discretization algorithm based on class-attribute contingency coefficient. Information Sciences 178(3), 714–731 (2008)

    Article  Google Scholar 

  21. Zighed, D.A., Rabaséda, S., Rakotomalala, R.: Fusinter: a method for discretization of continuous attributes. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6(03), 307–326 (1998)

    Article  MATH  Google Scholar 

  22. Yang, Y., Webb, G.I.: Discretization for naive-bayes learning: managing discretization bias and variance. Machine Learning 74(1), 39–74 (2009)

    Article  Google Scholar 

  23. Iam-on, N., Boongoen, T., Garrett, S.: LCE: a link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 26(12), 1513–1519 (2010)

    Article  Google Scholar 

  24. Fred, A.L., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6), 835–850 (2005)

    Article  Google Scholar 

  25. Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52(1–2), 91–118 (2003)

    Article  MATH  Google Scholar 

  26. Xing, E.P., Jordan, M.I., Russell, S., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: Advances in Neural Information Processing Systems, pp. 505–512 (2002)

    Google Scholar 

  27. Bache, K., Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

  28. Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M., Ventura, S., Garrell, J., Otero, J., Romero, C., Bacardit, J., Rivas, V., Fernández, J., Herrera, F.: Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing 13(3), 307–318 (2009)

    Article  Google Scholar 

  29. Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing 17(255–287), 11 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kittakorn Sriwanna .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Sriwanna, K., Boongoen, T., Iam-On, N. (2016). An Enhanced Univariate Discretization Based on Cluster Ensembles. In: Lavangnananda, K., Phon-Amnuaisuk, S., Engchuan, W., Chan, J. (eds) Intelligent and Evolutionary Systems. Proceedings in Adaptation, Learning and Optimization, vol 5. Springer, Cham. https://doi.org/10.1007/978-3-319-27000-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27000-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26999-3

  • Online ISBN: 978-3-319-27000-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics