Abstract
Most discretization algorithms focus on the univariate case. In general, they take into account the target class or interval-wise frequency of data. In so doing, useful information regarding natural group, hidden pattern and correlation among the attributes may be inevitably lost. In response, this paper introduces a new pruning method that exploits natural groups or clusters as an explicit constraint to traditional cut-point determination techniques. This unsupervised approach makes use of cluster ensembles to reveal similarities between data belonging to adjacent intervals. To be precise, a cut-point between a pair of highly similar or related intervals will be dropped. This pruning mechanism is coupled with three different univariate discretization algorithms, with the evaluation is conducted on 10 datasets and 3 classifier models. The results suggest that the proposed method usually achieve higher classification accuracy levels, than those of the three baseline counterparts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco (2011)
Sriwanna, K., Puntumapon, K., Waiyamai, K.: An enhanced class-attribute interdependence maximization discretization algorithm. In: Advanced Data Mining and Applications, pp. 465–476. Springer (2012)
Yang, P., Li, J.S., Huang, Y.X.: HDD: a hypercube division-based algorithm for discretisation. International Journal of Systems Science 42(4), 557–566 (2011)
Bay, S.D.: Multivariate discretization for set mining. Knowledge and Information Systems 3(4), 491–512 (2001)
Garcia, S., Luengo, J., Sáez, J.A., López, V., Herrera, F.: A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering 25(4), 734–750 (2013)
Sang, Y., Li, K.: Combining univariate and multivariate bottom-up discretization. Journal of Multiple-Valued Logic & Soft Computing 20 (2013)
Kerber, R.: Chimerge: discretization of numeric attributes. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 123–128. Aaai Press (1992)
Liu, H., Setiono, R.: Feature selection via discretization. IEEE Transactions on knowledge and Data Engineering 9(4), 642–645 (1997)
Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Transactions on Knowledge and Data Engineering 16(2), 145–153 (2004)
Ching, J.Y., Wong, A.K., Chan, K.C.C.: Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(7), 641–651 (1995)
Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning (1993)
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Machine learning EWSL 1991, pp. 164–178. Springer (1991)
Dougherty, J., Kohavi, R., Sahami, M., et al.: Supervised and unsupervised discretization of continuous features. In: ICML, pp. 194–202 (1995)
Kang, Y., Wang, S., Liu, X., Lai, H., Wang, H., Miao, B.: An ica-based multivariate discretization algorithm. In: Knowledge Science, Engineering and Management, pp. 556–562. Springer (2006)
Gupta, A., Mehrotra, K.G., Mohan, C.: A clustering-based discretization for supervised learning. Statistics & Probability Letters 80(9), 816–824 (2010)
Parashar, A., Gulati, Y.: Survey of different partition clustering algorithms and their comparative studies. International Journal of Advanced Research in Computer Science 3(3) (2012)
Singh, G.K., Minz, S.: Discretization using clustering and rough set theory. In: International Conference on Computing: Theory and Applications, ICCTA 2007, pp. 330–336. IEEE (2007)
Kuncheva, L., Hadjitodorov, S.T., et al.: Using diversity in cluster ensembles. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 2, pp. 1214–1219. IEEE (2004)
Cano, A., Nguyen, D., Ventura, S., Cios, K.: ur-CAIM: improved caim discretization for unbalanced and balanced data. Soft Computing, 1–16 (2014)
Tsai, C.J., Lee, C.I., Yang, W.P.: A discretization algorithm based on class-attribute contingency coefficient. Information Sciences 178(3), 714–731 (2008)
Zighed, D.A., Rabaséda, S., Rakotomalala, R.: Fusinter: a method for discretization of continuous attributes. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6(03), 307–326 (1998)
Yang, Y., Webb, G.I.: Discretization for naive-bayes learning: managing discretization bias and variance. Machine Learning 74(1), 39–74 (2009)
Iam-on, N., Boongoen, T., Garrett, S.: LCE: a link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 26(12), 1513–1519 (2010)
Fred, A.L., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6), 835–850 (2005)
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52(1–2), 91–118 (2003)
Xing, E.P., Jordan, M.I., Russell, S., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: Advances in Neural Information Processing Systems, pp. 505–512 (2002)
Bache, K., Lichman, M.: UCI machine learning repository (2013)
Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M., Ventura, S., Garrell, J., Otero, J., Romero, C., Bacardit, J., Rivas, V., Fernández, J., Herrera, F.: Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing 13(3), 307–318 (2009)
Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing 17(255–287), 11 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Sriwanna, K., Boongoen, T., Iam-On, N. (2016). An Enhanced Univariate Discretization Based on Cluster Ensembles. In: Lavangnananda, K., Phon-Amnuaisuk, S., Engchuan, W., Chan, J. (eds) Intelligent and Evolutionary Systems. Proceedings in Adaptation, Learning and Optimization, vol 5. Springer, Cham. https://doi.org/10.1007/978-3-319-27000-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-27000-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26999-3
Online ISBN: 978-3-319-27000-5
eBook Packages: EngineeringEngineering (R0)