Abstract
A certain observation which is unusual or different from all other ones is called the outlier or anomaly. Appropriate evaluation of data is a crucial problem in modelling of the real objects or phenomena. Actually investigated problems often are based on data mass-produced by computer systems, without careful inspection or screening. The great amount of generated and processed information (e.g. so-called Big-Data) cause that possible outliers often go unnoticed and the result is that they can be masked. However, in regression, this situation can be more complicated. The identification and evaluation of the extremely atypical measurements in observations, for instance in some areas of medicine, geology, particularly in seismology (earthquakes), is precisely the outliers that are the subjects of interest. In this paper, a nonparametric procedure based on Parzen kernel for estimation of unknown function is applied. Evaluation of which measurements in input data-set could be recognized as outliers and possibly should be removed has been performed using the Cook’s Distance formula. Anomaly detection is still an important problem to be researched within diverse areas and application domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andersen, R.: Modern Methods for Robust Regression. Quantitative Applications in the Social Sciences, vol. 152. Sage, Thousand Oaks (2008)
Beg, I., Rashid, T.: Modelling uncertainties in multi-criteria decision making using distance measure and topsis for hesitant fuzzy sets. J. Artif. Intell. Soft Comput. Res. 7(2), 103–109 (2017)
Bollen K.A., Jackman R.W.: Regression diagnostics: an expository treatment of outliers and influential cases. In: Fox, J., Scott, L.J. (eds.) Modern Methods of Data Analysis, pp. 257–291. Sage, Newbury Park (1990). ISBN 0-8039-3366-5
Cook, R.D.: Detection of influential observations in linear regression. Technometrics 19, 15–18 (1977). American Statistical Association
Cook, R.D.: Residuals and Influence in Regression. Weisberg, Sanford, New York (1982)
Chandola, V., Banerjee A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), Article 15, 58 p. Chapman and Hall (2009). https://doi.org/10.1145/1541880.1541882 ISBN 0-412-24280-X
Cpalka, K., Rebrova, O., Nowicki, R., et al.: On design of flexible neuro-fuzzy systems for nonlinear modelling. Int. J. Gen. Syst. 42(6), 706–720 (2013)
Cpałka, K., Łapa, K., Przybył, A.: A new approach to design of control systems using genetic programming. Inf. Technol. Control 44(4), 433–442 (2015)
Duch, W., Korbicz, J., Rutkowski, L., Tadeusiewicz, R. (eds.): Biocybernetics and Biomedical Engineering 2000. Neural Networks, vol. 6. Akademicka Oficyna Wydawnicza, EXIT, Warsaw (2000). (in Polish)
Galkowski, T., Rutkowski, L.: Nonparametric recovery of multivariate functions with applications to system identification. In: Proceedings of the IEEE, vol. 73, pp. 942–943, New York (1985)
Galkowski, T., Rutkowski, L.: Nonparametric fitting of multivariable functions. IEEE Trans. Autom. Control AC–31, 785–787 (1986)
Galkowski, T.: Nonparametric estimation of boundary values of functions. Arch. Control Sci. 3(1–2), 85–93 (1994)
Gałkowski, T.: Kernel estimation of regression functions in the boundary regions. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013. LNCS (LNAI), vol. 7895, pp. 158–166. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38610-7_15
Galkowski, T., Pawlak, M.: Nonparametric extension of regression functions outside domain. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014. LNCS (LNAI), vol. 8467, pp. 518–530. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07173-2_44
Galkowski, T., Pawlak, M.: Orthogonal series estimation of regression functions in nonstationary conditions. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2015. LNCS (LNAI), vol. 9119, pp. 427–435. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19324-3_39
Galkowski, T., Pawlak, M.: Nonparametric estimation of edge values of regression functions. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2016. LNCS (LNAI), vol. 9693, pp. 49–59. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39384-1_5
Galkowski, T., Pawlak, M.: The novel method of the estimation of the Fourier transform based on noisy measurements. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2017. LNCS (LNAI), vol. 10246, pp. 52–61. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59060-8_6
Gasser, T., Müller, H.-G.: Kernel estimation of regression functions. In: Gasser, T., Rosenblatt, M. (eds.) Smoothing Techniques for Curve Estimation. LNM, vol. 757, pp. 23–68. Springer, Heidelberg (1979). https://doi.org/10.1007/BFb0098489
Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark, R.G., Mietus, J.E., Moody, G.B., Peng, C.-K., Stanley, H.E.: Components of a new research resource for complex physiologic signals, PhysioBank, PhysioToolkit, and PhysioNet. Circulation 101(23), 215–220 (2000)
Greblicki, W., Rutkowski, L.: Density-free Bayes risk consistency of nonparametric pattern recognition procedures. Proc. IEEE 69(4), 482–483 (1981)
Grycuk, R., Gabryel, M., Nowicki, R., Scherer, R.: Content-based image retrieval optimization by differential evolution. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 86–93 (2016)
Grycuk, R., Scherer, R., Gabryel, M.: New image descriptor from edge detector and blob extractor. J. Appl. Math. Comput. Mech. 14(4), 31–39 (2015)
Korytkowski, M., Rutkowski, L., Scherer, R.: On combining backpropagation with boosting. In: International Joint Conference on Neural Networks, pp. 1274–1277 (2006)
Zhang, L., Lin, J., Karim, R.: Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl.-Based Syst. 139, 50–63 (2018)
Liu, H., Gegov, A., Cocea, M.: Rule based networks: an efficient and interpretable representation of computational models. J. Artif. Intell. Soft Comput. Res. 7(2), 111–123 (2017)
Parzen, E.: On estimation of a probability density function and mode. Anal. Math. Stat. 33(3), 1065–1076 (1962)
Rotar, C., Iantovics, L.B.: Directed evolution - a new metaheuristc for optimization. J. Artif. Intell. Soft Comput. Res. 7(3), 183–200 (2017)
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, Hoboken (2003)
Rutkowski, L.: A general approach for nonparametric fitting of functions and their derivatives with applications to linear circuits identification. IEEE Trans. Circuits Syst. 33(8), 812–818 (1986)
Rutkowski, L.: Sequential pattern recognition procedures derived from multiple Fourier series. Pattern Recognit. Lett. 8, 213–216 (1988)
Rutkowski, L.: Non-parametric learning algorithms in the time-varying environments. Sig. Process. 18(2), 129–137 (1989)
Rutkowski, L.: Multiple Fourier series procedures for extraction of nonlinear regressions from noisy data. IEEE Trans. Sig. Process. 41(10), 3062–3065 (1993)
Rutkowski, L., Cpalka, K.: Compromise approach to neuro-fuzzy systems. In: Intelligent Technologies-Theory and Applications, 2nd Euro-International Symposium on Computation Intelligence, Kosice, Slovakia. Frontiers in Artificial Intelligence and Applications, vol. 76, pp. 85–90 (2002)
Starczewski, A.: A new validity index for crisp clusters. Pattern Anal. App. 20(3), 687–700 (2017)
Starczewski, A., Krzyżak, A.: Improvement of the validity index for determination of an appropriate data partitioning. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2017. LNCS (LNAI), vol. 10246, pp. 159–170. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59060-8_16
Tezuka, T., Claramunt, C.: Kernel analysis for estimating the connectivity of a network with event sequences. J. Artif. Intell. Soft Comput. Res. 7(1), 17–31 (2017)
Yan, P.: Mapreduce and semantics enabled event detection using social media. J. Artif. Intell. Soft Comput. Res. 7(3), 201–213 (2017)
Łapa, K., Cpałka, K., Wang, L.: New method for design of fuzzy systems for nonlinear modelling using different criteria of interpretability. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014. LNCS (LNAI), vol. 8467, pp. 217–232. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07173-2_20
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Galkowski, T., Cader, A. (2018). Outliers Detection in Regressions by Nonparametric Parzen Kernel Estimation. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer Science(), vol 10842. Springer, Cham. https://doi.org/10.1007/978-3-319-91262-2_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-91262-2_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91261-5
Online ISBN: 978-3-319-91262-2
eBook Packages: Computer ScienceComputer Science (R0)