Abstract
In the paper the issue of outlier detection and substitution (correction) in stream data is raised. The previous research showed that even a small number of outliers in the data influences the prediction model application quality in a significant way. In this paper we try to find a proper complex method of outliers proceeding for stream data. The procedure consists of a method of outlier detection, a statistic used for the outstanding values replacement, a historic horizon for the replacing value calculation. To find the best strategy, a wide grid of experiments were prepared. All experiments were performed on semi–artificial data: data coming from the underground coal mining environment with an artificially introduced dependent variable and randomly introduced outliers. In the paper a new approach for the local outlier correction is presented, that in several cases improved the classification quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In case of exceeding the range of the variable, an appropriate boundary value was used.
References
Abadi, D., Carney, D., Çetintemel, U., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)
Arvind, A., Brian, B., Shivnath, B., John, C., Keith, I., Rajeev, M., Utkarsh, S., Jennifer, W.: Stream: The stanford data stream management system (2004)
Breunig, M., Kriegel, H.P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Chandrasekaran, S., Cooper, O., Deshpande, A., et al.: TelegraphCQ: continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, p. 668 (2003)
Gama, J.: Knowledge Discovery from Data Streams. Chapman & Hall/CRC, New York (2010)
Gupta, M., Gao, J., Aggarwal, C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014)
Halatchev, M., Gruenwald, L.: Estimating missing values in related sensor data streams. In: Haritsa, J., Vijayaraman, T. (eds.) COMAD, pp. 83–94 (2005)
Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)
Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., Przystałka, P.: Influence of outliers introduction on predictive models quality. Commun. Comput. Inf. Sci. 613, 79–93 (2016)
Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., Przystałka, P.: Data intensive vs sliding window outlier detection in the stream data — an experimental approach. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2016. LNCS (LNAI), vol. 9693, pp. 73–87. Springer, Heidelberg (2016). doi:10.1007/978-3-319-39384-1_7
Kuna, H., Garcia-Martinez, R., Villatoro, F.: Outlier detection in audit logs for application systems. Inf. Syst. 44, 22–33 (2014)
Pigott, T.: A review of methods for missing data. Educ. Res. Eval. 7(4), 353–383 (2001)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 427–438 (2000)
Sadik, S., Gruenwald, L.: Research issues in outlier detection for data streams. ACM SIGKDD Explor. Newsl. 1(15), 33–40 (2013)
Acknowledgements
This work was partially supported by Polish National Centre for Research and Development (NCBiR) grant PBS2/B9/20/2013 within Applied Research Programmes.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Kalisch, M., Michalak, M., Przystałka, P., Sikora, M., Wróbel, Ł. (2016). Outlier Detection and Elimination in Stream Data – An Experimental Approach. In: Flores, V., et al. Rough Sets. IJCRS 2016. Lecture Notes in Computer Science(), vol 9920. Springer, Cham. https://doi.org/10.1007/978-3-319-47160-0_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-47160-0_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47159-4
Online ISBN: 978-3-319-47160-0
eBook Packages: Computer ScienceComputer Science (R0)