Encyclopedia of Social Network Analysis and Mining

Living Edition
| Editors: Reda Alhajj, Jon Rokne

Outlier Detection with Uncertain Data Using Graphics Processors

  • Takazumi MatsumotoEmail author
  • Edward Hung
  • Man Lung Yiu
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4614-7163-9_376-1



Outlier detection

A data mining task in which data points that are outside expected patterns in a given dataset are identified.

Parallel processing

A technique in which a task is split into multiple parts to be executed simultaneously by multiple processors.

Graphics processing unit (GPU)

A specialized processor that is designed to compute large numbers of mathematical operations in parallel, primarily for generating 3D graphics. Modern GPUs can also be programmed to perform a variety of other tasks.

General-purpose computing using GPUs (GPGPU)

Programming GPUs for computational tasks other than graphics.

Floating-point operations per second (FLOPS)

A measurement of computing performance using floating-point mathematical operations, often expressed in billions of FLOPS (GFLOPS).


Outlier detection, also known as anomaly detection, is a widely used fundamental data mining...

This is a preview of subscription content, log in to check access.



The work described in this entry was partially supported by grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (PolyU 5191/09E, PolyU 5182/08E, PolyU 5166/11E), and the Hong Kong PhD Fellowship.


  1. Acklam PJ (2003) An algorithm for computing the inverse normal cumulative distribution function. Tech. repGoogle Scholar
  2. Aggarwal CC (ed) (2009) Managing and mining uncertain data. Springer, New YorkGoogle Scholar
  3. Aggarwal CC, Yu PS (2008) Outlier detection with uncertain data. In: Proceedings of the SIAM international conference on data mining. Atlanta, GA, pp 483–493Google Scholar
  4. Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609–623. Piscataway, NJGoogle Scholar
  5. Alshawabkeh M, Jang B, Kaeli D (2010) Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units. Pittsburgh, PA, pp 104–110Google Scholar
  6. Angiulli F, Basta S, Pizzuti C (2006) Distance-based detection of outliers. IEEE Trans Knowl Data Eng 18(2):145–160. Piscataway, NJGoogle Scholar
  7. Azmandian F, Yilmazer A, Dy JG, Aslam JA, Kaeli DR (2012) GPU-accelerated feature selection for outlier detection using the local kernel density ratio. In: Proceedings of the 12th IEEE international conference on data mining. Brussels, pp 51–60Google Scholar
  8. Bastke S, Deml M, Schmidt S (2009) Combining statistical network data, probabilistic neural networks and the computational power of gpus for anomaly detection in computer networks. In: 1st workshop on intelligent security (security and artificial intelligence). Thessaloniki, pp 1–6Google Scholar
  9. Bolton RJ, Hand DJ (2002) Statistical fraud detection: a review. Stat Sci 17(3):235–255. Beachwood, OHGoogle Scholar
  10. Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data. Dallas, TX, pp 93–104Google Scholar
  11. Chau M, Cheng R, Kao B, Ng J (2006) Uncertain data mining: an example in clustering location data. In: Proceedings of the 10th Pacific-Asia conference on knowledge discovery and data mining. Singapore, pp 199–204Google Scholar
  12. Denoeux T (2013) Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Trans Knowl Data Eng 25(1):119–130. Piscataway, NJGoogle Scholar
  13. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining. Portland, OR, pp 226–231Google Scholar
  14. Hawkins DM (1980) Identification of outliers. Chapman and Hall, LondonGoogle Scholar
  15. Heymann S, Latapy M, Magnien C (2012) Outskewer: using skewness to spot outliers in samples and time series. In: Proceedings of the IEEE/ACM international conference on advances in social network analysis and mining. Istanbul, pp 527–534Google Scholar
  16. Huhle B, Schairer T, Jenke P, Strasser W (2008) Robust non-local denoising of colored depth data. In: IEEE Computer society conference on computer vision and pattern recognition, workshop on time of flight camera based computer vision. Anchorage, AK, pp 1–7Google Scholar
  17. Hung E, Cheung DW (2002) Parallel mining of outliers in large database. Distrib Parallel Database 12(1):5–26. Hingham, MAGoogle Scholar
  18. Kao B, Lee SD, Cheung DW, Ho WS, Chan KF (2008) Clustering uncertain data using voronoi diagrams. In: Proceedings of the 8th IEEE international conference on data mining. Pisa, pp 333–342Google Scholar
  19. Khronos Group (2011) OpenCL. http://www.khronos.org/opencl. Accessed Oct 2012
  20. Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th VLDB conference. New York, pp 392–403Google Scholar
  21. Knorr EM, Ng RT (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th VLDB conference. Edinburgh, Scotland, pp 211–222Google Scholar
  22. Kriegel HP, Pfeifle M (2005a) Density-based clustering of uncertain data. In: Proceedings of the 11th ACM SIGKDD. Chicago, IL, pp 672–677Google Scholar
  23. Kriegel HP, Pfeifle M (2005b) Hierarchical density-based clustering of uncertain data. In: Proceedings of the 5th IEEE international conference on data mining. Houston, TX, pp 689–692Google Scholar
  24. Lan Z, Zheng Z, Li Y (2010) Toward automated anomaly identification in large-scale systems. IEEE Trans ParallelDistri Syst 21(2):174–187. Piscataway, NJGoogle Scholar
  25. Lozano E, Acuna E (2005) Parallel algorithms for distance-based and density based outliers. In: Proceedings of the 5th IEEE international conference on data mining. Houston, TX, pp 729–732Google Scholar
  26. Marsaglia G (2003) Xorshift RNGs. J Stat Softw 8(14):1–6. InnsbruckGoogle Scholar
  27. Matsumoto T, Hung E (2012) Accelerating outlier detection with uncertain data using graphics processors. In: Proceedings of the 16th Pacific-Asia conference on knowledge and data mining. Kuala Lumpur, pp 169–180Google Scholar
  28. Micikevicius P (2010) Analysis-driven optimization. In: GPU technology conference. New Orleans, LA, pp 1–55Google Scholar
  29. Murakami T, Kasahara R, Saito T (2010) An implementation and its evaluation of password cracking tool parallelized on GPGPU. In: Proceedings of the international symposium on communications and information technologies. Tokyo, pp 534–538Google Scholar
  30. Ngai WK, Kao B, Chui CK, Cheng R, Chau M, Yip KY (2006) Efficient clustering of uncertain data. In: Proceedings of the 6th IEEE international conference on data mining. Hong Kong, pp 436–445Google Scholar
  31. Nguyen HV, Gopalkrishnan V (May 2010) Feature extraction for outlier detection in high-dimensional spaces. J Mach Learn Res 10:66–75. Brookline, MAGoogle Scholar
  32. NVIDIA Corporation (2011) CUDA. http://www.nvidia.com/object/cuda_home_new.html. Accessed Oct 2012
  33. Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD international conference on management of data. Dallas, TX, pp 427–438Google Scholar
  34. Reif M, Goldstein M, Stahl A (2008) Anomaly detection by combining decision trees and parametric densities. In: Proceedings of the 19th international conference on pattern recognition. Tampa, FL, pp 1–4Google Scholar
  35. Sequeria K, Zaki M (2002) ADMIT: anomaly-based data mining for intrusions. In: Proceedings of the 8th ACM SIGKDD. Edmonton, pp 386–395Google Scholar
  36. Tang J, Chen Z, Fu AW, Cheung DW (2006) Capabilities of outlier detection schemes in large datasets, framework and methodologies. Knowl Inf Syst 11(1):45–84. New YorkGoogle Scholar
  37. Tarabalka Y, Haavardsholm TV, Kaasen I, Skauli T (2009) Real-time anomaly detection in hyperspectral images using multivariate normal mixture models and GPU processing. J Real-Time Image Proc 4(3):287–300. Boston, MAGoogle Scholar
  38. Wang L, Cheung DWL, Cheng R, Lee SD, Yang XS (2012) Efficient mining of frequent item sets on large uncertain databases. IEEE Trans Knowl Data Eng 24(12):2170–2183. Piscataway, NJGoogle Scholar
  39. Zhang Y, Lin X, Tao Y, Zhang W, Wang H (2012) Efficient computation of range aggregates against uncertain location-based queries. IEEE Trans Knowl Data Eng 24(7):1244–1258. Piscataway, NJGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2018

Authors and Affiliations

  • Takazumi Matsumoto
    • 1
    Email author
  • Edward Hung
    • 2
  • Man Lung Yiu
    • 2
  1. 1.Okinawa Institute of Science and TechnologyOkinawaJapan
  2. 2.The Department of ComputingThe Hong Kong Polytechnic UniversityHung HomHong Kong

Section editors and affiliations

  • V. S. Subrahmanian
    • 1
  • Jeffrey Chan
    • 2
  1. 1.University of MarylandCollege ParkUSA
  2. 2.RMIT (Royal Melbourne Institute of Technology)MelbourneAustralia