Skip to main content

Outlier Detection and Elimination in Stream Data – An Experimental Approach

  • Conference paper
  • First Online:
Rough Sets (IJCRS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9920))

Included in the following conference series:

Abstract

In the paper the issue of outlier detection and substitution (correction) in stream data is raised. The previous research showed that even a small number of outliers in the data influences the prediction model application quality in a significant way. In this paper we try to find a proper complex method of outliers proceeding for stream data. The procedure consists of a method of outlier detection, a statistic used for the outstanding values replacement, a historic horizon for the replacing value calculation. To find the best strategy, a wide grid of experiments were prepared. All experiments were performed on semi–artificial data: data coming from the underground coal mining environment with an artificially introduced dependent variable and randomly introduced outliers. In the paper a new approach for the local outlier correction is presented, that in several cases improved the classification quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In case of exceeding the range of the variable, an appropriate boundary value was used.

References

  1. Abadi, D., Carney, D., Çetintemel, U., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)

    Article  Google Scholar 

  2. Arvind, A., Brian, B., Shivnath, B., John, C., Keith, I., Rajeev, M., Utkarsh, S., Jennifer, W.: Stream: The stanford data stream management system (2004)

    Google Scholar 

  3. Breunig, M., Kriegel, H.P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)

    Google Scholar 

  4. Chandrasekaran, S., Cooper, O., Deshpande, A., et al.: TelegraphCQ: continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, p. 668 (2003)

    Google Scholar 

  5. Gama, J.: Knowledge Discovery from Data Streams. Chapman & Hall/CRC, New York (2010)

    Book  MATH  Google Scholar 

  6. Gupta, M., Gao, J., Aggarwal, C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  7. Halatchev, M., Gruenwald, L.: Estimating missing values in related sensor data streams. In: Haritsa, J., Vijayaraman, T. (eds.) COMAD, pp. 83–94 (2005)

    Google Scholar 

  8. Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)

    Article  MATH  Google Scholar 

  9. Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., Przystałka, P.: Influence of outliers introduction on predictive models quality. Commun. Comput. Inf. Sci. 613, 79–93 (2016)

    Article  Google Scholar 

  10. Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., Przystałka, P.: Data intensive vs sliding window outlier detection in the stream data — an experimental approach. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2016. LNCS (LNAI), vol. 9693, pp. 73–87. Springer, Heidelberg (2016). doi:10.1007/978-3-319-39384-1_7

    Chapter  Google Scholar 

  11. Kuna, H., Garcia-Martinez, R., Villatoro, F.: Outlier detection in audit logs for application systems. Inf. Syst. 44, 22–33 (2014)

    Article  Google Scholar 

  12. Pigott, T.: A review of methods for missing data. Educ. Res. Eval. 7(4), 353–383 (2001)

    Article  Google Scholar 

  13. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 427–438 (2000)

    Google Scholar 

  14. Sadik, S., Gruenwald, L.: Research issues in outlier detection for data streams. ACM SIGKDD Explor. Newsl. 1(15), 33–40 (2013)

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by Polish National Centre for Research and Development (NCBiR) grant PBS2/B9/20/2013 within Applied Research Programmes.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcin Michalak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Kalisch, M., Michalak, M., Przystałka, P., Sikora, M., Wróbel, Ł. (2016). Outlier Detection and Elimination in Stream Data – An Experimental Approach. In: Flores, V., et al. Rough Sets. IJCRS 2016. Lecture Notes in Computer Science(), vol 9920. Springer, Cham. https://doi.org/10.1007/978-3-319-47160-0_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47160-0_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47159-4

  • Online ISBN: 978-3-319-47160-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics