Abstract
In the digital era, collecting relevant information of a technological process has become increasingly cheaper and easier. However, due to the huge available amount of data, supervised classification is one of the most challenging tasks within the artificial intelligence field. Feature selection solves this problem by removing irrelevant and redundant features from data. In this paper we propose a new feature selection algorithm called Swcfs, which works well in high-dimensional and noisy data. Swcfs can detect noisy features by leveraging the sliding window method over the set of consecutive features ranked according to their non-linear correlation with the class feature. The metric Swcfs uses to evaluate sets of features, with respect to their relevance to the class label, is the bayesian risk, which represents the theoretical upper error bound of deterministic classification. Experiments reveal Swcfs is more accurate than most of the state-of-the-art feature selection algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rohrmair, G., Lowe, G.: Using data-independence in the analysis of intrusion detection systems. Theor. Comput. Sci. 340(1), 82–101 (2005)
Angeleska, A., Jonoska, N., Saito, M.: Rewriting rule chains modeling DNA rearrangement pathways. Theor. Comput. Sci. 454, 5–22 (2012)
De Maria, E., Fages, F., Rizk, A., Soliman, S.: Design, optimization, and predictions of a coupled model of the cell cycle, circadian clock, DNA repair system, irinotecan metabolism and exposure control under temporal logic constraints. Theor. Comput. Sci. 412(21), 2108–2127 (2011)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Molina, L.C., Belanche, L., Nebot, A.: Feature selection algorithms: a survey and experimental evaluations. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), 9–12 December 2002, Maebashi City (2002)
Hodorog, M., Schicho, J.: A regularization approach for estimating the type of a plane curve singularity. Theor. Comput. Sci. 479, 99–119 (2013)
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant feature and the subset selection problem. In: ICML (1994)
Shin, K., Kuboyama, T., Hashimoto, T., Shepard, D.: Super-CWC and super-LCC: super fast feature selection algorithms. In: Proceedings of 2015 IEEE International Conference on Big Data (Big Data), pp. 1–7 (2015)
Pino Angulo, A., Shin, K.: Fast and accurate steepest-descent consistency-constrained algorithms for feature selection. In: Pardalos, P., Pavone, M., Farinella, G.M., Cutello, V. (eds.) MOD 2015. LNCS, vol. 9432, pp. 293–305. Springer, Cham (2015). doi:10.1007/978-3-319-27926-8_26
Shin, K., Xu, X.M.: A consistency-constrained feature selection algorithm with the steepest descent method. In: Torra, V., Narukawa, Y., Inuiguchi, M. (eds.) MDAI 2009. LNCS, vol. 5861, pp. 338–350. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04820-3_31
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256. Morgan Kaufman Publishers Inc. (1992)
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994). doi:10.1007/3-540-57868-4_57
Xiaofei, H., Deng, C., Partha, N.: Laplacian score for feature selection. In: Proceedings of the 18th International Conference on Neural Information Processing Systems (NIPS 2005), pp. 507–514 (2005)
Zhu, L., Miao, L., Zhang, D.: Iterative Laplacian score for feature selection. In: Liu, C.-L., Zhang, C., Wang, L. (eds.) CCPR 2012. CCIS, vol. 321, pp. 80–87. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33506-8_11
Quanquan, G., Zhenhui, L., Jiawei, H.: Generalized Fisher score for feature selection. In: Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence (UAI 2011), pp. 266–273 (2011)
Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003) (2003)
Guyon, I., Weston, J., Barnhill, S.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389 (2002)
Hall, M.A., Smith, L.A.: Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In: Proceedings of the Twelfth International, pp. 235–239. AAAI Press (1999)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the IEEE Computer Society Conference on Bioinformatics (CSB 2003) (2003)
Zhao, Z., Liu, H.: Searching for interacting features. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI 2007) (2007)
Shin, K., Xu, X.M.: Consistency-based feature selection. In: Velásquez, J.D., RÃos, S.A., Howlett, R.J., Jain, L.C. (eds.) KES 2009. LNCS, vol. 5711, pp. 342–350. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04595-0_42
Lichman, M.: UCI machine learning repository, School of Information and Computer Science, University of California, Irvine (2013). http://archive.ics.uci.edu/ml
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Pino Angulo, A., Shin, K. (2017). Improving Classification Accuracy by Means of the Sliding Window Method in Consistency-Based Feature Selection. In: Yamamoto, A., Kida, T., Uno, T., Kuboyama, T. (eds) Discovery Science. DS 2017. Lecture Notes in Computer Science(), vol 10558. Springer, Cham. https://doi.org/10.1007/978-3-319-67786-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-67786-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67785-9
Online ISBN: 978-3-319-67786-6
eBook Packages: Computer ScienceComputer Science (R0)