Instance Selection and Outlier Generation to Improve the Cascade Classifier Precision

Neugebauer, Judith; Kramer, Oliver; Sonnenschein, Michael

doi:10.1007/978-3-319-53354-4_9

Judith Neugebauer¹⁵,
Oliver Kramer¹⁵ &
Michael Sonnenschein¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10162))

Included in the following conference series:

International Conference on Agents and Artificial Intelligence

836 Accesses
3 Citations

Abstract

Classification of high-dimensional time series with imbalanced classes is a challenging task. For such classification tasks, the cascade classifier has been proposed. The cascade classifier tackles high-dimensionality and imbalance by splitting the classification task into several low-dimensional classification tasks and aggregating the intermediate results. Therefore the high-dimensional data set is projected onto low-dimensional subsets. But these subsets can employ unfavorable and not representative data distributions, that hamper classifiction again. Data preprocessing can overcome these problems. Small improvements in the low-dimensional data subsets of the cascade classifier lead to an improvement of the aggregated overall results. We present two data preprocessing methods, instance selection and outlier generation. Both methods are based on point distances in low-dimensional space. The instance selection method selects representative feasible examples and the outlier generation method generates artificial infeasible examples near the class boundary. In an experimental study, we analyse the precision improvement of the cascade classifier due to the presented data preprocessing methods for power production time series of a micro Combined Heat and Power plant and an artificial and complex data set. The precision increase is due to an increased selectivity of the learned decision boundaries. This paper is an extended version of [19], where we have proposed the two data preprocessing methods. In this paper we extend the analysis of both algorithms by a parameter sensitivity analysis of the distance parameters from the preprocessing methods. Both distance parameters depend on each other and have to be chosen carefully. We study the influence of these distance parameters on the classification precision of the cascade model and derive parameter fitting rules for the \(\mu \)CHP data set. The experiments yield a region of optimal parameter value combinations leading to a high classification precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Data are available for download on our department website http://www.uni-oldenburg.de/informatik/ui/forschung/themen/cascade/.

References

Bagnall, A., Davis, L.M., Hills, J., Lines, J.: Transformation based ensembles for time series classification. In: Proceedings of the 12th SIAM International Conference on Data Mining, pp. 307–318 (2012)
Google Scholar
Bánhalmi, A., Kocsor, A., Busa-Fekete, R.: Counter-example generation-based one-class classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 543–550. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74958-5_51
Chapter Google Scholar
Bellinger, C., Sharma, S., Japkowicz, N.: One-class versus binary classification: which and when? In: 11th International Conference on Machine Learning and Applications, ICMLA 2012, vol. 2, pp. 102–106, December 2012
Google Scholar
Blachnik, M.: Ensembles of instance selection methods based on feature subset. Procedia Comput. Sci. 35, 388–396 (2014). Knowledge-Based and Intelligent Information and Engineering Systems 18th Annual Conference, KES-2014 Gdynia, Poland, September 2014 Proceedings
Article Google Scholar
Borgonovo, E., Plischke, E.: Sensitivity analysis: a review of recent advances. Eur. J. Oper. Res. 248(3), 869–887 (2016)
Article MathSciNet MATH Google Scholar
Bremer, J., Rapp, B., Sonnenschein, M.: Support vector based encoding of distributed energy resources’ feasible load spaces. In: Innovative Smart Grid Technologies Conference Europe IEEE PES (2010)
Google Scholar
Cortez, P., Embrechts, M.: Opening black box data mining models using sensitivity analysis. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 341–348, April 2011
Google Scholar
Cortez, P., Embrechts, M.J.: Using sensitivity analysis and visualization techniques to open black box data mining models. Inf. Sci. 225, 1–17 (2013)
Article Google Scholar
Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)
Article Google Scholar
Hamby, D.M.: A review of techniques for parameter sensitivity analysis of environmental models. Environ. Monit. Assess. 32, 135–154 (1994)
Article Google Scholar
He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Heiselberg, P., Brohus, H., Hesselholt, A., Rasmussen, H., Seinre, E., Thomas, S.: Application of sensitivity analysis in design of sustainable buildings. Renew. Energy 34(9), 2030–2036 (2009). Special Issue: Building and Urban Sustainability
Article Google Scholar
Jankowski, N., Grochowski, M.: Comparison of instances seletion algorithms I. Algorithms survey. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 598–603. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24844-6_90
Chapter Google Scholar
Japkowicz, N.: Assessment Metrics for Imbalanced Learning, pp. 187–206. Wiley, Hoboken (2013)
Google Scholar
Kleijnen, J.P.C.: Design and Analysis of Simulation Experiments. International Series in Operations Research and Management Science. Springer, Heidelberg (2015)
Book MATH Google Scholar
Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high-dimensional data. Brief. Bioinform. 14(1), 13–26 (2013)
Article Google Scholar
Liu, H., Motoda, H., Gu, B., Hu, F., Reeves, C.R., Bush, D.R.: Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol. 608, 1st edn. Springer US, New York (2001)
Book Google Scholar
Neugebauer, J., Kramer, O., Sonnenschein, M.: Classification cascades of overlapping feature ensembles for energy time series data. In: Woon, W.L., Aung, Z., Madnick, S. (eds.) DARE 2015. LNCS (LNAI), vol. 9518, pp. 76–93. Springer, Heidelberg (2015). doi:10.1007/978-3-319-27430-0_6
Chapter Google Scholar
Neugebauer, J., Kramer, O., Sonnenschein, M.: Improving cascade classifier precision by instance selection and outlier generation. In: ICAART, vol. 8 (2016, in print)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Shang, Y.W., Qiu, Y.H.: A note on the extended Rosenbrock function. Evol. Comput. 14(1), 119–126 (2006)
Article Google Scholar
Tax, D.M.J., Duin, R.P.W.: Uniform object generation for optimizing one-class classifiers. J. Mach. Learn. Res. 2, 155–173 (2002)
MATH Google Scholar
Tomašev, N., Buza, K., Marussy, K., Kis, P.B.: Hubness-aware classification, instance selection and feature construction: survey and extensions to time-series. In: Stańczyk, U., Jain, L.C. (eds.) Feature Selection for Data and Pattern Recognition. SCI, vol. 584, pp. 231–262. Springer, Heidelberg (2015). doi:10.1007/978-3-662-45620-0_11
Google Scholar
Tsai, C.F., Eberle, W., Chu, C.Y.: Genetic algorithms in feature and instance selection. Knowl.-Based Syst. 39, 240–247 (2013)
Article Google Scholar
Wilson, D., Martinez, T.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)
Article MATH Google Scholar
Wu, J., Dhingra, R., Gambhir, M., Remais, J.V.: Sensitivity analysis of infectious disease models: methods, advances and their application. J. R. Soc. Interface 10(86), 1–14 (2013)
Article Google Scholar
Zhuang, L., Dai, H.: Parameter optimization of kernel-based one-class classifier on imbalance text learning. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 434–443. Springer, Heidelberg (2006). doi:10.1007/978-3-540-36668-3_47
Chapter Google Scholar

Download references

Acknowledgement

This work was funded by the Ministry for Science and Culture of Lower Saxony with the PhD program System Integration of Renewable Energy (SEE).

Author information

Authors and Affiliations

Department of Computing Science, Carl von Ossietzky University Oldenburg, Oldenburg, Germany
Judith Neugebauer, Oliver Kramer & Michael Sonnenschein

Authors

Judith Neugebauer
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Kramer
View author publications
You can also search for this author in PubMed Google Scholar
Michael Sonnenschein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Judith Neugebauer .

Editor information

Editors and Affiliations

Leiden University, Leiden, The Netherlands
Jaap van den Herik
Polytechnic Institute of Setúbal/INSTICC, Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neugebauer, J., Kramer, O., Sonnenschein, M. (2017). Instance Selection and Outlier Generation to Improve the Cascade Classifier Precision. In: van den Herik, J., Filipe, J. (eds) Agents and Artificial Intelligence. ICAART 2016. Lecture Notes in Computer Science(), vol 10162. Springer, Cham. https://doi.org/10.1007/978-3-319-53354-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-53354-4_9
Published: 07 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53353-7
Online ISBN: 978-3-319-53354-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics