Skip to main content
Log in

Continuous Outlier Monitoring on Uncertain Data Streams

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach — Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Niennattrakul V, Keogh E, Ratanamahatana C A. Data edit-ing techniques to allow the application of distance-based outlier detection to streams. In Proc. the 10th International Conference on Data Mining, December 2010, pp.947-952.

  2. Jin C Q, Zhang J W, Zhou A Y. Continuous ranking on uncertain streams. Frontiers of Computer Science, 2012, 6(6): 686-699.

    MathSciNet  Google Scholar 

  3. Zhang C, Gao M, Zhou A Y. Tracking high quality clusters over uncertain data streams. In Proc. the 25th Int. Conf. Data Engineering, March 29-April 2, 2009, pp.1641-1648.

  4. Aggarwal C C. On density based transforms for uncertain data mining. In Proc. the 23rd International Conference on Data Engineering, April 2007, pp.866-875.

  5. Barbar D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487-502.

    Article  Google Scholar 

  6. Burdick D, Deshpande P M, Jayram T S, Ramakrishnan R, Vaithyanathan S. OLAP over uncertain and imprecise data. In Proc. the 31st Int. Conf. Very Large Data Bases, August 2005, pp.970-981.

  7. Cheng R, Kalashnikov D V, Prabhakar S. Evaluating probabilistic queries over imprecise data. In Proc. International Conference on Management of Data, June 2003, pp.551-562.

  8. Sarma A D, Benjelloun O, Halevy A, Widom J.Working models for uncertain data. In Proc. the 22nd International Conference on Data Engineering, April 2006, p.7.

  9. Singh S, Mayfield C, Prabhakar S, Shah R, Hambrusch S. Indexing uncertain categorical data. In Proc. the 23rd Int. Conf. Data Engineering, April 2007, pp.616-625.

  10. Tao Y, Cheng R, Xiao X, Ngai W K, Kao B, Prabhakar S. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In Proc. the 31st Int. Conf. Very Large Data Bases, August 2005, pp.922-933.

  11. Chen M, Yu G, Gu Y, Jia Z X, Wang Y Q. An efficient method for cleaning dirty-events over uncertain data in WSNs. J. Computer Science and Technology, 2011, 26(6): 942-953.

    Article  MATH  Google Scholar 

  12. Yang D, Rundensteiner E A, Ward M O. Neighbor-based pattern detection for windows over streaming data. In Proc. the 12th International Conference on Extending Database Technology, March 2009, pp.529-540.

  13. Aggarwal C C, Han J, Wang J, Yu P S. A framework for clustering evolving data streams. In Proc. the 29th Int. Conf. Very Large Data Bases, September 2003, pp.81-92.

  14. Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data stream systems. In Proc. the 21st ACM SIGMOD-SIGART-SIGACT Symposium on Principles of Database Systems, June 2002, pp.1-16.

  15. Knorr E M, Ng R T. Algorithms for mining distance-based outliers in large datasets. In Proc. the 24th International Conference on Very Large Data Bases, August 1998, pp.392-403.

  16. Angiulli F, Fassetti F. Detecting distance-based outliers in streams of data. In Proc. the 16th International Conference on Information and Knowledge Management, November 2007, pp.811-820.

  17. Kontaki M, Gounaris A, Papadopoulos A N et al. Continuous monitoring of distance-based outliers over data streams. In Proc. the 27th International Conference on Data Engineering, April 2011, pp.135-146.

  18. Assent I, Kranen P, Baldauf C, Seidl T. AnyOut: Anytime outlier detection on streaming data. In Proc. the 17th International Conference on Databases Systems for Advanced Applications, Vol.1, April 2012, pp.228-242.

  19. Aggarwal C C, Yu P S. Outlier detection with uncertain data. In Proc. SIAM Int. Conf. Data Mining, April 2008, pp.483-493.

  20. Wang B, Xiao G, Yu H, Yang X. Distance-based outlier detection on uncertain data. In Proc. the 9th Int. Conf. Comp. and Information Technology, October 2009, pp.293-298.

  21. Jiang B, Pei J. Outlier detection on uncertain data: Objects, instances, and inferences. In Proc. the 27th International Conference on Data Engineering, April 2011, pp.422-433.

  22. Wang B, Yang X C, Wang G R, Yu G. Outlier detection over sliding windows for probabilistic data streams. Journal of Computer Science and Technology, 2010, 25(3): 389-400.

    Article  MathSciNet  Google Scholar 

  23. Cao K Y, Han D H, Wang G R, et al. An algorithm for outlier detection on uncertain data stream. In Proc. the 15th Asia-Pacific Web Conference, April 2013, pp.449-460.

  24. Yan C, Chen G L, Shen Y F. Outlier analysis for gene expression data. Journal of Computer Science and Technology, 2004, 19(1): 13-21.

    Article  MathSciNet  Google Scholar 

  25. Knorr E M, Ng R T. Finding intensional knowledge of distance-based outliers. In Proc. the 25th International Conference on Very Large Data Bases, Sept. 2009, pp.211-222.

  26. Das Sarma A, Benjelloun O, Halevy A, Widom J. Working models for uncertain data. In Proc. the 22nd International Conference on Data Engineering, April 2006, p.7.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke-Yan Cao.

Additional information

The work is supported by the National Natural Science Foundation of China under Grant Nos. 61025007, 61328202, 61173029, 61100024, 61332006, and 61073063, the National High Technology Research and Development 863 Program of China under Grant No. 2012AA011004, and the National Basic Research 973 Program of China under Grant No. 2011CB302200-G.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 73 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, KY., Wang, GR., Han, DH. et al. Continuous Outlier Monitoring on Uncertain Data Streams. J. Comput. Sci. Technol. 29, 436–448 (2014). https://doi.org/10.1007/s11390-014-1441-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-014-1441-x

Keywords

Navigation