Advertisement

Applied Intelligence

, Volume 48, Issue 5, pp 1097–1110 | Cite as

Anomaly detection using piecewise aggregate approximation in the amplitude domain

  • Huorong Ren
  • Xiujuan Liao
  • Zhiwu Li
  • Abdulrahman AI-Ahmari
Article

Abstract

Anomaly detection has received much attention due to its various applications. Generally, the first step to discover anomalies is a data representation method which reduces dimensionality as well as preserves key information. Anomaly detection based on real-value representation methods is meaningful for its convenience in numeric operation. A typical real-value representation method is the Piecewise Aggregate Approximation (PAA) that is simple and intuitive by capturing mean values of segments in a sequence. However, if segments are same or similar in their average values but different in their oscillation amplitudes, the PAA method is ineffective to describe a sequence composed of such segments. To address this issue, we propose a representation method called the Piecewise Aggregate Approximation in the Amplitude Domain (AD-PAA). For discovering anomalies, a sequence is partitioned into subsequences by a sliding window firstly. Then in the AD-PAA method, a subsequence is divided into equal size subsections according to the amplitude domain. With mean values of subsections computed, the amplitude oscillation of a subsequence is embodied effectively. When the AD-PAA method is applied to approximate subsequences, the AD-PAA representation of a sequence is constructed. Anomalies are determined by anomaly scores that are based on similarities among representation results. Experimental results on various data confirm that the proposed method is more accurate than the PAA based method and other comparison methods. The ability to differentiate anomalies of the proposed algorithm is also superior.

Keywords

Sequences Anomaly detection Data representation Anomaly score 

Notes

Acknowledgements

The authors extend their appreciation to the International Scientific Partnership Program ISPP at King Saud University for funding this research Work through ISPP#0799.

References

  1. 1.
    Akouemo H N, Povinelli R J (2016) Probabilistic anomaly detection in natural gas time series data. Int J Forecast 32(3):948–956. doi: 10.1016/j.ijforecast.2015.06.001 CrossRefGoogle Scholar
  2. 2.
    Andrysiak T (2016) Machine learning techniques applied to data analysis and anomaly detection in ecg signals. Appl Artif Intell 30(6):610–634. doi: 10.1080/08839514.2016.1193720 MathSciNetCrossRefGoogle Scholar
  3. 3.
    Avazbeigi M, Doulabi S H H, Karimi B (2010) Choosing the appropriate order in fuzzy time series: a new N-factor fuzzy time series for prediction of the auto industry production. Expert Syst Appl 37(8):5630–5639. doi: 10.1016/j.eswa.2010.02.049 CrossRefGoogle Scholar
  4. 4.
    Balasooriya U (1989) Detection of outliers in the exponential distribution based on prediction. Commun Stat- Theory Methods 18(2):711–720. doi: 10.1080/03610928908829929 MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Breunig MM, Kriegel H, Ng RT, Jsander (2000) Lof: identifying density-based local outliers. In: ACM SIGMOD international conference on management of data, pp 93–104. doi: 10.1145/342009.335388
  6. 6.
    Buu HTQ, Anh DT (2011) Time series discord discovery based on iSAX symbolic representation. In: Proceedings of the third international conference on knowledge and systems engineering, pp 11–18. doi: 10.1109/KSE.2011.11
  7. 7.
    Chakrabarti K, Keogh E, Mehrotra S, Pazzani M (2002) Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans Database Syst 27(2):188–228. doi: 10.1145/568518.568520 CrossRefGoogle Scholar
  8. 8.
    Chan F K P, Fu A W C, Yu C (2003) Haar wavelets for efficient similarity search of time-series: with and without time warping. IEEE Trans Knowl Data Eng 15(3):686–705. doi: 10.1109/TKDE.2003.1198399 CrossRefGoogle Scholar
  9. 9.
    Chang P C, Fan C Y, Lin J L (2011) Trend discovery in financial time series data using a case based fuzzy decision tree. Expert Syst Appl 38(5):6070–6080. doi: 10.1016/j.eswa.2010.11.006 CrossRefGoogle Scholar
  10. 10.
    Chaovalit P, Gangopadhyay A, Karabatis G, Chen Z Y (2011) Discrete wavelet transform-based time series analysis and mining. ACM Comput Surv 43(2):33–63. doi: 10.1145/1883612.1883613 CrossRefzbMATHGoogle Scholar
  11. 11.
    Chen X Y, Zhan Y Y (2008) Multi-scale anomaly detection algorithm based on infrequent pattern of time series. J Comput Appl Math 214(1):227–237. doi: 10.1016/j.cam.2007.02.027 MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv 45(1):12–45. doi: 10.1145/2379776.2379788 CrossRefzbMATHGoogle Scholar
  13. 13.
    Fu AWC, Leung OTW, Keogh E, Lin J (2006) Finding time series discords based on haar transform. In: Proceedings of international conference on advanced data mining and applications, pp 31–41. doi: 10.1007/11811305_3
  14. 14.
    Fuchs E, Gruber T, Nitschke J, Sick B (2010) Online segmentation of time series based on polynomial least-squares approximations. IEEE Trans Pattern Anal Mach Intell 32(12):2232–2245. doi: 10.1109/TPAMI.2010.44 CrossRefGoogle Scholar
  15. 15.
    Guerrero J L, Berlanga A, Garc J, Molina J M (2010) Piecewise linear representation segmentation as a multiobjective optimization problem. Adv Intell Soft Comput 79:267–274. doi: 10.1007/978-3-642-14883-5_35 Google Scholar
  16. 16.
    Guo CH, Li HL, Pan DH (2010) An improved piecewise aggregate approximation based on statistical features for time series mining. In: International conference on knowledge science, engineering and management, pp 234–244. doi: 10.1007/978-3-642-15280-1_23
  17. 17.
    Hung NQ, Anh DT (2008) An improvement of PAA for dimensionality reduction in large time series databases. In: Proceedings of pacific rim international conference on artificial intelligence, pp 698–707. doi: 10.1007/978-3-540-89197-0_64
  18. 18.
    Izakian H, Pedrycz W (2013) Anomaly detection in time series data using a fuzzy C-means clustering. In: Proceedings of IFSA world congress and NAFIPS meeting, pp 1513–1518. doi: 10.1109/IFSA-NAFIPS.2013.6608627
  19. 19.
    Jaing M F, Tseng S S, Su C M (2001) Two-phase clustering process for outliers detection. Pattern Recogn Lett 22(6–7):691–700. doi: 10.1016/S0167-8655(00)00131-8 CrossRefzbMATHGoogle Scholar
  20. 20.
    Jones M, Nikovski D, Imamura M, Hirata T (2016) Exemplar learning for extremely efficient anomaly detection in real-valued time series. Data Min Knowl Disc 30(6):1–28. doi: 10.1007/s10618-015-0449-3 MathSciNetCrossRefGoogle Scholar
  21. 21.
    Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001a) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286. doi: 10.1007/PL00011669
  22. 22.
    Keogh E, Chu S, Hart D, Pazzani M (2001b) An online algorithm for segmenting time series. In: Proceedings of IEEE international conference on data mining, pp 289–296. doi: 10.1109/ICDM.2001.989531
  23. 23.
    Keogh E, Lin J, Fu AWC (2005) Details about time series discords. http://www.cs.ucr.edu/eamonn/discords
  24. 24.
    Keogh E, Lin J, Fu A W, Herle H V (2006) Finding unusual medical time-series subsequences: algorithms and applications. IEEE Trans Inf Technol Biomed 10(3):429–439. doi: 10.1109/TITB.2005.863870 CrossRefGoogle Scholar
  25. 25.
    Knorr E M, Ng R, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8 (3):237–253. doi: 10.1007/s007780050006 CrossRefGoogle Scholar
  26. 26.
    Lemire D, 2007 A better alternative to piecewise linear time series segmentation. In: Proceedings of SIAM international conference on data mining, pp 985–993. doi: 10.1137/1.9781611972771.59
  27. 27.
    Leng MW, Lai XS, Tan G, Xu X (2009) Time series representation for anomaly detection. In: IEEE international conference on computer science and information technology, pp 628–632. doi: 10.1109/ICCSIT.2009.5234775
  28. 28.
    Leng M W, Yu W, Wu S, Hu H (2013) Anomaly detection algorithm based on pattern density in time series. Lecture Notes Electr Eng 236:305–311. doi: 10.1007/978-1-4614-7010-6_35 CrossRefGoogle Scholar
  29. 29.
    Li G L, Bräysy O, Jiang L X, Wu Z D, Wang Y Z (2013) Finding time series discord based on bit representation clustering. Knowl-Based Syst 54(4):243–254. doi: 10.1016/j.knosys.2013.09.015 CrossRefGoogle Scholar
  30. 30.
    Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the eighth ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11. doi: 10.1145/882082.882086
  31. 31.
    Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144. doi: 10.1007/s10618-007-0064-z MathSciNetCrossRefGoogle Scholar
  32. 32.
    Lippi M, Bertini M, Frasconi P (2013) Short-term traffic flow forecasting: an experimental comparison of time-series analysis and supervised learning. IEEE Trans Intell Transp Syst 14 (2):871–882. doi: 10.1109/TITS.2013.2247040 CrossRefGoogle Scholar
  33. 33.
    Lonardi S, Lin J, Keogh E, Chiu B (2006) Efficient discovery of unusual patterns in time series. N Gener Comput 25(1):61–93. doi: 10.1007/s00354-006-0004-2 CrossRefzbMATHGoogle Scholar
  34. 34.
    Luo W, Gallagher M, Wiles J (2013) Parameter-free search of time-series discord. J Comput Sci Technol 28(2):300–310. doi: 10.1007/s11390-013-1330-8 CrossRefzbMATHGoogle Scholar
  35. 35.
    Ma J, Perkins S (2003) Online novelty detection on temporal sequences. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 613–618. doi: 10.1145/956750.956828
  36. 36.
    Ma J G, Sun L, Wang H, Zhang Y C, Aickelin U (2016) Supervised anomaly detection in uncertain pseudoperiodic data streams. ACM Trans Internet Technol 16(1):1–20. doi: 10.1145/2806890 CrossRefGoogle Scholar
  37. 37.
    Mok M S, Sohn S Y, Ju Y H (2010) Random effects logistic regression model for anomaly detection. Expert Syst Appl 37(10):7162–7166. doi: 10.1016/j.eswa.2010.04.017 CrossRefGoogle Scholar
  38. 38.
    Quinn J A, Sugiyama M (2014) A least-squares approach to anomaly detection in static and sequential data. Pattern Recogn Lett 40(1):36–40. doi: 10.1016/j.patrec.2013.12.016 CrossRefGoogle Scholar
  39. 39.
    Shahabi C, Tian XM, Zhao WG (2000) TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data. In: Proceedings of the twelfth international conference on scientific and statistical database management, pp 55–68. doi: 10.1109/SSDM.2000.869778
  40. 40.
    Tewatia D K, Tolakanahalli R P, Paliwal B R, Tomé W A (2011) Time series analyses of breathing patterns of lung cancer patients using nonlinear dynamical system theory. Phys Med Biol 56(7):2161–2181. doi: 10.1118/1.4734982 CrossRefGoogle Scholar
  41. 41.
    Truong C D, Anh D T (2015) An efficient method for motif and anomaly detection in time series based on clustering. Int J Bus Intell Data Min 10(4):356–377. doi: 10.1504/IJBIDM.2015.072212 CrossRefGoogle Scholar
  42. 42.
    Viinikka J, Debar H, Mé L, Lehikoinen A, Tarvainen M (2009) Processing intrusion detection alert aggregates with time series modeling. Inf Fusion 10(4):312–324. doi: 10.1016/j.inffus.2009.01.003 CrossRefGoogle Scholar
  43. 43.
    Yan Q Y, Chen X T (2013) A novel never-ending uncertain Top-k discord detection method. Inf Technol J 12(19):4906–4910. doi: 10.3923/itj.2013.4906.4910 CrossRefGoogle Scholar
  44. 44.
    Yang Y, Hu H P, Xiong W, Ding F (2011) A novel network traffic anomaly detection model based on superstatistics theory. J Networks 6(2):311–318. doi: 10.4304/jnw.6.2.311-318 Google Scholar
  45. 45.
    Yi BK, Faloutsos C (2000) Fast time sequence indexing for arbitrary L p Norms. In: Proceedings of the twenty-sixth international conference on very large data bases, pp 385–394Google Scholar
  46. 46.
    Zhao J, Liu K, Wang W, Liu Y (2014) Adaptive fuzzy clustering based anomaly data detection in energy system of steel industry. Inf Sci Int J 259(3):335–345. doi: 10.1016/j.ins.2013.05.018 Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.School of Electro-Mechanical EngineeringXidian UniversityXi’anChina
  2. 2.Institute of Systems EngineeringMacau University of Science and TechnologyTaipaChina
  3. 3.Industrial Engineering Department, College of EngineeringKing Saud UniversityRiyadhSaudi Arabia

Personalised recommendations